Understanding Windows OS Architecture, Win32 and Native APIs

Introduction

In this post we are going to explore the architecture of Windows OS, take a closer look at operational modes and delve into the differences between the API layers, examining their purposes and the role each plays in the process of a user-mode application making a call to the kernel.

The goal of this post is trying to demystify some of the inner working of Windows OS, and understand the meanings of the thousands acronyms that are commonly used when talking about evasion techniques or malware in general, which may make things appear as more complex than they really are, especially for newcomers.

As red (or blue) teamers, we often need to walk uncommon paths and interact with internal OS components to evade or implement defenses. To do so, I believe it’s important to have a clear understanding of the internals, enabling us to make decisions based on the context and the objectives of the assessment, rather than relying on anecdotal evidence or using tool X or Y just because it is trending on Twitter, without really understanding what is doing, and why it is/isn’t working.

Hopefully, this post will give you a solid grasp of the covered topics and will be a good starting point for further research. But enough with the intro, let’s get started!

OS Architecture and Operating Modes

While all the code that runs in kernel space shares a single virtual address space and is allowed to access any memory page (also in user address space), a user-mode process runs in an isolated environment with much more restrictions. A user mode process can be any app that is run by the user, but also system services and everything that is not allowed to access sensitive OS components directly.

The clear separation between the two spaces, prevents user applications from inadvertently (or intentionally, in case of malware) accessing and modifying critical operating system components and data.

The diagram below shows the distinction between the two modes, and an high-level overview of the key components included with each one:

user-mode and kernel-mode components

Starting from the top of the diagram, we have user-mode processes. These processes can be either:

user processes: all processes related to standard user applications, like browsers, code editors, office apps, etc.
service processes: similar to user processes, but run in the background and don’t have a GUI for the users. In Windows, services almost always run as an instance of the svchost.exe process, the service host process.
system processes: critical processes for the normal functioning of the operating system. Crashing or terminating one of these processes would cause a system crash. Example of system processes are lsass.exe, winlogon.exe, services.exe.
environment subsystem(s): provide the environment and APIs as support for the applications. Historically different subsystems were included with Windows NT, (POSIX, OS/2, etc.) but in modern Windows versions, only the Win32 subsystem still exists (running from the csrss.exe image). Recently with Windows 10 version 1607, the Windows subsystem for Linux (WSL) has been introduced to provide support for Linux applications via Pico processes and providers.

User processes make use of APIs exported by subsystem DLLs to access kernel mode resources. Subsystem DLLs are infact a collection of dynamic linked libraries, officially documented by Microsoft, which implements the Win32 API. Examples of subsystem DLLs are user32.dll, kernel32.dll, advapi32.dll, and so on.

On the other hand, some service processes and system processes are also native processes, which means that call native APIs directly. Windows native APIs is implemented by ntdll.dll, a system wide DLL that takes care of making the transition from user mode to kernel mode with syscall invocations.

Once in kernel-mode we can find the Executive, the upper part of ntoskrnl.exe. The executive is essentially a group of kernel-mode components that provide a plethora of services to device drivers. The functionalities provided by the Windows Executive, are split into several subsystems such as: I/O Manager, IPC Manager, Virtual Memory Manager (VMM), Process Manager, PnP Manager, Power Manager, Security Reference Monitor (SRM), and so on. Grouped together can be referred to as Executive services. The Object Manager is a special executive subsystem that manages kernel objects. Infact, all other components must pass through it to gain access to system resources.

Between the executive and the Hardware Abstraction Layer (HAL) that abstracts the physical hardware and hides the differences in it, there are kernel drivers and the *kernel. Device drivers are loadable kernel modules used by Windows NT to interact with hardware devices, via a set of exported routines.

Finally, the kernel (the other part of ntoskrnl.exe), that implements the most critical part of the OS, provides multiprocessor synchronization, thread scheduling, interrupt and exception dispatching. It is also responsible for initializing device drivers at boot that are needed by the operating system to start.

Privilege Rings

Another common way to refer to the access modes is using privilege levels or rings, a hierarchical organization of privilege levels that determine the types of operations that a process or user is allowed to perform. The concept is similar to the distinction between user-mode and kernel-mode in Windows, but it is more general and can be applied to other operating systems as well.

x86 and x64 processor architectures define four rings to provide the OS a way to protect system code and critical OS data from being tampered from both buggy and malicious code.

privilege rings

To have a more portable and efficient architecture Windows uses only two privilege rings. Ring 3, the least privileged one, is used for all applications running in user space, while Ring 0, the most privileged one, is for critical OS components (e.g. the kernel, device drivers, etc.) running in kernel space.

During its life-span, a user-mode program, may need to access restricted resources, or performing privileged operations (e.g. reading a file, creating a process, etc.) that it cannot do directly. For all of those operations the user-mode program usually calls Win32 or standard library functions, that go through native functions exported by ntdll.dll, transition to kernel-mode via syscalls, and call the corresponding function in Ntokrnl.exe or Win32.sys.

This multi-layer structure also results in an improved stability for the system, since access to sensitive resources is constrained by the usage of APIs. The call goes through multiple checks before reaching the routine at the end of the chain, that has instead unconstrained access to the resource. Applications developers typically don’t need to call native functions in their application code, but they can take advantage of several higher-level APIs, that take care of setting everything up, before actually switching to kernel mode. On the contrary, developers that writes code that runs in kernel-mode need to be extra careful with every operation, since there’s a high chance that the usage of an incorrect address or an improper handling of a resource will cause a BSOD.

Win32 API

Windows Win32 API is the developer-friendly API, a collection of functions and structs that are exported by subsystem DLLs, and that are used by user-mode processes to interact with the operating system. For almost all of its functions and structs there’s a dedicated page in the reference documentation at learn.microsoft.com

To see how the layered architecture of Windows works in practice, let’s take a look at the following example.

Let’s say we want to create a test file in the C:\Temp directory. We can do that by calling the CreateFileW function from kernel32.dll.

#include <windows.h>

int wmain()
{
    HANDLE hFile = ::CreateFileW(
        L"\\??\\\\C:\\Temp\\test.txt",
        GENERIC_READ | GENERIC_WRITE,
        0,
        NULL,
        CREATE_NEW,
        FILE_ATTRIBUTE_TEMPORARY,
        NULL);

    if (hFile == INVALID_HANDLE_VALUE) {
        return EXIT_FAILURE;
    }

    ::CloseHandle(hFile);
    return EXIT_SUCCESS;
}

If we compile the program and start a capture with Procmon using the following filters:

Process Name is file01.exe (or the name you used for the compiled executable)
Operation is CreateFile
Path ends with test.txt

we should now see a CreateFile event, generated by the process when kernel32!CreateFile is invoked.

CreateFileW event

As expected, the file creation operation performed by the program with kernel32!CreateFile, results in a series of calls to the corresponding native and kernel-mode functions.

CreateFileW call stack

In this case Procmon displays the call as originating from kernelbase.dll instead of kernel32.dll, because kernel32!CreateFileW just redirects the call to CreateFileW api-ms-win-core-file-l1-1-0.dll (a.k.a kernelbase.dll), as we can clearly see if we open kernel32.dll with BinaryNinja. This is also explained more thoroughly in this post by @jaredcatkinson.

CreateFileW kernel32

CreateFileW kernelbase

Windows NT API

A step down in the hierarchy, we can find the native API (sometimes also called NTAPI), which we can imagine as the lowest user-mode layer traversed by a call before entering the OS kernel. Most of the API is implemented in ntdll.dll and at the upper edge of ntoskrnl.exe and its variants.

The DLL contains two types of functions:

System service dispatch stubs responsible for transitioning from user-mode to kernel-mode and calling the corresponding kernel-mode function.
Internal support functions used by subsystems and higher-level DLLs

The majority of exported symbols within these libraries are prefixed Nt or Zw for the kernel-mode variant. Nt* and Zw* function serve essentially the same purpose, but the Zw* ones have a different calling convention that is optimized for kernel-mode execution. Those functions are essentially gates to enter the kernel, since the real function implementation is in the corresponding kernel-mode routine.

To use ntdll.dll functions code that are not present in Windows kits, we must explicitly declare the prototype for the ones we want to use, and resolve their address dynamically. For instance, in C++ code we can use the following template:

typedef NTSTATUS (NTAPI* <FunctionName>)(
    <ParameterType> <ParameterName>,
    ...
);

Let’s go over each part of the function prototype:

extern "C" is a linkage specification that tells the compiler to use the C calling convention for the function (as opposed to the C++ calling convention if we are writing C++ code).
NTSTATUS is the return type of the function. It’s a typedef for long, and it’s used to indicate the success or failure of the function. The NTSTATUS type is defined in ntdef.h header file.
NTAPI is a macro defined in winnt.h that tells the compiler to use the NT calling convention for the function (equals to __stdcall).
<FunctionName>, <ParameterType> and <ParameterName> are the name, type and name of the parameters of the function. They can have data flow prefixes and everything that you would expect from a function prototype.

Let’s rewrite the above example code, and create the file using Nt* functions. This time we can’t do everything with just a single API call, but we need to call the following functions:

RtlInitUnicodeString to initialize a UNICODE_STRING structure.
InitializeObjectAttributes function to initialize a OBJECT_ATTRIBUTES structure, that is used to specify the attributes of an object, such as its name, security descriptor, and other properties.
NtCreateFile to create the file

Their definitions can be found in winternl.h header file, which contains declarations for many of the functions and types that are part of the Windows NT API. Once modified, the new code to create the file with native API functions should look something like this:

File: ntapi.h

#ifndef __NTAPI_H
#define __NTAPI_H

#include <ntdef.h>
#include <windows.h>

#ifdef __cplusplus
extern "C"{
#endif

typedef struct _IO_STATUS_BLOCK
{
    union {
        NTSTATUS Status;
        PVOID Pointer;
    } DUMMYUNIONNAME;
    ULONG_PTR Information;
} IO_STATUS_BLOCK, *PIO_STATUS_BLOCK;

typedef VOID(
    NTAPI* pRtlInitUnicodeString)(PUNICODE_STRING DestinationString, PCWSTR SourceString);

typedef NTSTATUS(NTAPI* pNtCreateFile)(
    PHANDLE FileHandle,
    ACCESS_MASK DesiredAccess,
    POBJECT_ATTRIBUTES ObjectAttributes,
    PIO_STATUS_BLOCK IoStatusBlock,
    PLARGE_INTEGER AllocationSize,
    ULONG FileAttributes,
    ULONG ShareAccess,
    ULONG CreateDisposition,
    ULONG CreateOptions,
    PVOID EaBuffer,
    ULONG EaLength);

#ifdef __cplusplus
}
#endif

#endif /* __NTAPI_H */

File: main.cpp

#include <ntapi.h>
#include <windows.h>

int wmain()
{
    HANDLE hFile;
    NTSTATUS status;
    UNICODE_STRING fpathU;
    OBJECT_ATTRIBUTES obj;
    IO_STATUS_BLOCK isb;
    LPCWSTR filepath = L"\\??\\\\C:\\Temp\\test.txt";

    // resolve functions addresses
    pNtCreateFile NtCreateFile =
        (pNtCreateFile)::GetProcAddress(::GetModuleHandle(L"ntdll.dll"), "NtCreateFile");
    pRtlInitUnicodeString RtlInitUnicodeString = (pRtlInitUnicodeString)::GetProcAddress(
        ::GetModuleHandle(L"ntdll.dll"),
        "RtlInitUnicodeString");

    // add wide char string to UNICODE_STRING struct
    RtlInitUnicodeString(&fpathU, filepath);

    // add filepath to object attributes
    InitializeObjectAttributes(&obj, &fpathU, OBJ_CASE_INSENSITIVE, NULL, NULL);

    // create file with native API
    status = NtCreateFile(
        &hFile,
        FILE_GENERIC_WRITE,
        &obj,
        &isb,
        0,
        FILE_ATTRIBUTE_NORMAL,
        FILE_SHARE_READ,
        FILE_OVERWRITE_IF,
        FILE_NON_DIRECTORY_FILE,
        NULL,
        0);

    if (!NT_SUCCESS(status)) {
        ::CloseHandle(hFile);
        return EXIT_FAILURE;
    }

    ::CloseHandle(hFile);
    return EXIT_SUCCESS;
}

Using Procmon with the same filters that we used before, we should now see just a single call to ntdll!NtCreateFile as user-mode call:

procmon capture with NtCreateFile

As expected, in the call stack for the file creation event there are no more calls to kernel32 APIs, but just a direct call to ntdll!NtCreateFile before switching to kernel mode.

Why use native API structs and calls, which may not be well documented, instead of their higher-level counterparts that are more reliable and quicker? For most app developers, it doesn’t offer any benefits. But in some scenarios, you might need to go for the low-level stuff. Let’s say that for some reason you want to bypass an anti-malware solution or an anti-cheat engine that monitors potentially malicious APIs. In that case you may need to have more control over the process of accessing system services, and avoid calling runtime functions or Win32 APIs. That’s not obviously a guaranteed bypass, and usually in 2023 evading a decent EDR is much more complex than just calling native APIs, but that’s a topic for another post.

Wrap Up

That’s a wrap on this foundational post! I hope you found it useful and not too boring. As always, feel free to reach out to me on Twitter if you have any doubt or spot any mistake in this post.

Thanks for reading and see you in the next post!

Understanding Windows OS Architecture, Win32 and Native APIs

Introduction #

OS Architecture and Operating Modes #

Privilege Rings #

Win32 API #

Windows NT API #

Wrap Up #

References #