sam-b.github.io

Revisiting Windows Security Hardening Through Kernel Address Protection

Back in 2011 when Windows 7 Service Pack 1 was king of the hill and I was just starting to learn to program (via Harvard’s epic CS50), j00ru published a whitepaper on various ways to access Windows kernel pointers from User Mode: Windows Security Hardening Through Kernel Address Protection.

I decided to revisit the techniques outlined in the paper, get versions of them working on Windows 7 and then look into if they work on Windows 8/8.1/10 and if not look into how the functions have been modified in the newer versions of Windows. A lot of this has been covered elsewhere but I learned a reasonable amount doing this and it was a fun reversing exercise so hopefully people will find it useful :)

For each example I will walk through writing a working implementation for Windows 7 32 bit and running it, then porting it to 64 bit Windows and notes on any changes to the feature being used, if it’s found to no longer work on newer versions of Windows.

All the Visual Studio projects for each technique can be grabbed in a single repository on Github.

Windows System Information classes

The NtQuerySystemInformation is a classic and very well known undocumented function which, using reverse engineered details, can be used to gather information about the state of the Windows kernel. It is defined on MSDN as follows:

NTSTATUS WINAPI NtQuerySystemInformation(
  _In_      SYSTEM_INFORMATION_CLASS SystemInformationClass,
  _Inout_   PVOID                    SystemInformation,
  _In_      ULONG                    SystemInformationLength,
  _Out_opt_ PULONG                   ReturnLength
);

The first argument is a value in the SYSTEM_INFORMATION_CLASS enum, this the value which decides what information is returned. Some of the values for this enum can be found in winternl.h and others have been reverse engineered (for example here’s the wine projects implementation). In j00ru’s paper he looked at 4 of the enum values which will be elaborated on individually below.

The 2nd and 3rd parameters are a pointer to a structure for the output data, which varies depending on the SystemInformationClass value, and it’s size. The final parameter is used to return how much data was written to the output structure.

In order to avoid repeating code for each of the SystemInformationClass values, I’ll give the code for actually defining and calling NtQuerySystemInformation here. First of all we’ll include the standard Visual Studio project header and a full import of Windows.h which defines many of the Windows specific structures and functions we’ll need to use.

#include "stdafx.h"
#include <windows.h>

We’ll also need to define the NyQuerySystemInformation function in such a way that we can cast a pointer to it and then call it.

typedef NTSTATUS(WINAPI *PNtQuerySystemInformation)(
	__in SYSTEM_INFORMATION_CLASS SystemInformationClass,
	__inout PVOID SystemInformation,
	__in ULONG SystemInformationLength,
	__out_opt PULONG ReturnLength
);

Finally we’ll need to locate the NtQuerySystemInformation function within ntdll, this is done by getting a HANDLE to ntdll, finding the procedure address within it and then quickly checking that it has been found successfully.

HMODULE ntdll = GetModuleHandle(TEXT("ntdll"));
PNtQuerySystemInformation query = (PNtQuerySystemInformation)GetProcAddress(ntdll, "NtQuerySystemInformation");
if (query == NULL) {
	printf("GetProcAddress() failed.\n");
	return 1;
}

Once this code has run the query variable can be called like a function.

#Windows 7 32 bit

##SystemModuleInformation

The first SystemInformationClass value looked at is SystemModuleInformation, the data returned when this value is used includes the addresses all currently loaded drivers are loaded at in kernel space, their names and sizes.

First of all we need to define the SYSTEM_INFORMATION_CLASS enum value we will pass to NtQuerySystemInformation later, in this case it’s 11 as shown below.

typedef enum _SYSTEM_INFORMATION_CLASS {
	SystemModuleInformation = 11
} SYSTEM_INFORMATION_CLASS;

Next we need to define the structure that NtQuerySystemInformation will load information into when called with SystemModuleInformation (taken from reactos/wine line 2129 onward).

#define MAXIMUM_FILENAME_LENGTH 255 

typedef struct SYSTEM_MODULE {
	ULONG                Reserved1;
	ULONG                Reserved2;
	PVOID                ImageBaseAddress;
	ULONG                ImageSize;
	ULONG                Flags;
	WORD                 Id;
	WORD                 Rank;
	WORD                 w018;
	WORD                 NameOffset;
	BYTE                 Name[MAXIMUM_FILENAME_LENGTH];
}SYSTEM_MODULE, *PSYSTEM_MODULE;

typedef struct SYSTEM_MODULE_INFORMATION {
	ULONG                ModulesCount;
	SYSTEM_MODULE        Modules[1];
} SYSTEM_MODULE_INFORMATION, *PSYSTEM_MODULE_INFORMATION;

As you can see the SYSTEM_MODULE structure includes the ImageBaseAddress, ImageSize and Name fields which is what we’re interested in. In order to find out how much memory we need to allocate we have to call NtQuerySystemInformation with the SystemModuleInformation enum value and a NULL output pointer, when this is done it will load the number of bytes needed into the ReturnLength parameter.

ULONG len = 0;
query(SystemModuleInformation, NULL, 0, &len);

Now that we know how much memory is needed we can allocate a SYSTEM_MODULE_INFORMATION structure of the right size and call NtQuerySystemInformation again.

PSYSTEM_MODULE_INFORMATION pModuleInfo = (PSYSTEM_MODULE_INFORMATION)GlobalAlloc(GMEM_ZEROINIT, len);
if (pModuleInfo == NULL) {
	printf("Could not allocate memory for module info.\n");
	return 1;
}

query(SystemModuleInformation, pModuleInfo, len, &len);
if (len == 0) {
	printf("Failed to retrieve system module information.\r\n");
	return 1;
}

After verifying that everything has returned without any errors, we can use the ModulesCount field to iterate over the SYSTEM_MODULE array and print the key details about each module.

for (int i = 0; i < pModuleInfo->ModulesCount; i++) {
	PVOID kernelImageBase = pModuleInfo->Modules[i].ImageBaseAddress;
	PCHAR kernelImage = (PCHAR)pModuleInfo->Modules[i].Name;
	printf("Module name %s\t", kernelImage);
	printf("Base Address 0x%X\r\n", kernelImageBase);
}

Building and then running the code gave the following output. The full code for this example (including the changes for running it on 64 bit Windows discussed later) can be grabbed from Github.

##SystemHandleInformation

The second SystemInformationClass value mentioned in j00ru’s paper is SystemHandleInformation, this gives the HANDLEs and pointers to every object from every process in kernel memory, including all Token objects. I’ll be using the extended version of SystemHandleInformation since as discussed in the paper, the original version only gives 16 bits of the HANDLE value which may not be enough in some circumstances. First of all we again define the correct SYSTEM_INFORMATION_CLASS value.

typedef enum _SYSTEM_INFORMATION_CLASS {
	SystemExtendedHandleInformation = 64
} SYSTEM_INFORMATION_CLASS;

Next we need to define the output structure (Taken from Process Hacker line 1595 onward).

typedef struct _SYSTEM_HANDLE
{
	PVOID Object;
	HANDLE UniqueProcessId;
	HANDLE HandleValue;
	ULONG GrantedAccess;
	USHORT CreatorBackTraceIndex;
	USHORT ObjectTypeIndex;
	ULONG HandleAttributes;
	ULONG Reserved;
} SYSTEM_HANDLE, *PSYSTEM_HANDLE;

typedef struct _SYSTEM_HANDLE_INFORMATION_EX
{
	ULONG_PTR HandleCount;
	ULONG_PTR Reserved;
	SYSTEM_HANDLE Handles[1];
} SYSTEM_HANDLE_INFORMATION_EX, *PSYSTEM_HANDLE_INFORMATION_EX;

As you can see the output structure includes the HandleValue for each object and the Object field which is a pointer to its location in memory.

NtQuerySystemInformation has a weird API for using this SystemInformationClass value, instead of returning the memory needed when called with a NULL pointer it just returns the NTSTATUS code 0xC0000004. This is the code for STATUS_INFO_LENGTH_MISMATCH which is returned if too little memory has been allocated for the output to be written to. In order to deal with this I allocate a small amount of memory for the output and then keep calling NtQuerySystemInformation with double the memory each time until it returns a different status code.

ULONG len = 20;
NTSTATUS status = (NTSTATUS)0xc0000004;
PSYSTEM_HANDLE_INFORMATION_EX pHandleInfo = NULL;
do {
	len *= 2;
	pHandleInfo = (PSYSTEM_HANDLE_INFORMATION_EX)GlobalAlloc(GMEM_ZEROINIT, len);

	status = query(SystemExtendedHandleInformation, pHandleInfo, len, &len);

} while (status == (NTSTATUS) 0xc0000004);

Once enough memory has been allocated the function should return successfully and we can iterate over the output as before, printing the values we are interested in.

for (int i = 0; i < pHandleInfo->HandleCount; i++) {
	PVOID object = pHandleInfo->Handles[i].Object;
	HANDLE handle = pHandleInfo->Handles[i].HandleValue;
	HANDLE pid = pHandleInfo->Handles[i].UniqueProcessId;
	printf("PID: %d\t", pid);
	printf("Object 0x%X\t", object);
	printf("Handle 0x%X\r\n", handle);
}

Building and then running the code gave me the following output :)

The full code as well as the changes for it to run on 64 bit Windows (which are discussed later) can be found on Github.

##SystemLockInformation

The third SystemInformationClass value examined in j00ru’s paper is SystemLockInformation, this returns the details for and address in kernel memory of every Lock object that currently exists. Once again we’ll first define the correct SYSTEM_INFORMATION_CLASS value.

typedef enum _SYSTEM_INFORMATION_CLASS {
	SystemLockInformation = 12
} SYSTEM_INFORMATION_CLASS;

Next we need to define the output structure, for this I took the structure definition from j00ru’s paper and assumed the container structure with the LocksCount information would follow the pattern the other structures have.

typedef struct _SYSTEM_LOCK {
	PVOID	Address;
	USHORT  Type;
	USHORT  Reserved1;
	ULONG	ExclusiveOwnerThreadId;
	ULONG	ActiveCount;
	ULONG	ContentionCount;
	ULONG	Reserved2[2];
	ULONG	NumberOfSharedWaiters;
	ULONG	NumberOfExclusiveWaiters;
} SYSTEM_LOCK, *PSYSTEM_LOCK;

typedef struct SYSTEM_LOCK_INFORMATION {
	ULONG              LocksCount;
	SYSTEM_LOCK        Locks[1];
} SYSTEM_LOCK_INFORMATION, *PSYSTEM_LOCK_INFORMATION;

The key value to note for the SYSTEM_LOCK structure is the Address field which is a pointer to the object in kernel memory.

Much like using SystemExtendedHandleInformation we can’t just get NtQuerySystemInformation to give us the needed output buffer size to call it and resort to calling it in a loop util it stops giving the length mismatch error code.

PSYSTEM_LOCK_INFORMATION pLockInfo = NULL;
ULONG len = 20;
NTSTATUS status = (NTSTATUS)0xc0000004;

do {
	len *= 2;
	pLockInfo = (PSYSTEM_LOCK_INFORMATION)GlobalAlloc(GMEM_ZEROINIT, len);
	status = query(SystemLockInformation, pLockInfo, len, &len);
} while (status == (NTSTATUS)0xc0000004);

Once enough memory has been allocated and the output has been written, we can iterate over the output, printing the fields of interest.

for (int i = 0; i < pLockInfo->LocksCount; i++) {
	PVOID lockAddress = pLockInfo->Locks[i].Address;
	USHORT lockType = (USHORT)pLockInfo->Locks[i].Type;
	printf("Lock Address 0x%X\t", lockAddress);
	printf("Lock Type 0x%X\r\n", lockType);
}

This can be then be successfully run on Windows 7 32 bit:

The full code including the changes for 64 bit Windows can be grabbed from Github.

##SystemExtendedProcessInformation

The final SystemInformationClass value mentioned in j00ru’s paper is SystemExtendedProcessInformation, this returns detailed information about all processes and threads running on the system, including the addresses of each threads user and kernel mode stacks. We start by defining the correct SYSTEM_INFORMATION_CLASS value.

typedef enum _SYSTEM_INFORMATION_CLASS {
	SystemSessionProcessInformation = 57
} SYSTEM_INFORMATION_CLASS;

Next we need to define all of the output structures, these were taken from the BOINC project at Berkeley. We could probably get away with just leaving some of the fields as raw pointers but since it had been pretty thoroughly reversed it seemed worth using the full structure definitions.

typedef LONG       KPRIORITY;
typedef struct _CLIENT_ID {
	DWORD          UniqueProcess;
	DWORD          UniqueThread;
} CLIENT_ID;

typedef struct _UNICODE_STRING {
	USHORT         Length;
	USHORT         MaximumLength;
	PWSTR          Buffer;
} UNICODE_STRING;

typedef struct _VM_COUNTERS {
	SIZE_T         PeakVirtualSize;
	SIZE_T         VirtualSize;
	ULONG          PageFaultCount;
	SIZE_T         PeakWorkingSetSize;
	SIZE_T         WorkingSetSize;
	SIZE_T         QuotaPeakPagedPoolUsage;
	SIZE_T         QuotaPagedPoolUsage;
	SIZE_T         QuotaPeakNonPagedPoolUsage;
	SIZE_T         QuotaNonPagedPoolUsage;
	SIZE_T         PagefileUsage;
	SIZE_T         PeakPagefileUsage;
} VM_COUNTERS;

typedef enum _KWAIT_REASON
{
	Executive = 0,
	FreePage = 1,
	PageIn = 2,
	PoolAllocation = 3,
//SNIP
	WrRundown = 36,
	MaximumWaitReason = 37
} KWAIT_REASON;

typedef struct _SYSTEM_THREAD_INFORMATION{
	LARGE_INTEGER KernelTime;
	LARGE_INTEGER UserTime;
	LARGE_INTEGER CreateTime;
	ULONG WaitTime;
	PVOID StartAddress;
	CLIENT_ID ClientId;
	KPRIORITY Priority;
	LONG BasePriority;
	ULONG ContextSwitches;
	ULONG ThreadState;
	KWAIT_REASON WaitReason;
} SYSTEM_THREAD_INFORMATION, *PSYSTEM_THREAD_INFORMATION;

typedef struct _SYSTEM_EXTENDED_THREAD_INFORMATION
{
	SYSTEM_THREAD_INFORMATION ThreadInfo;
	PVOID StackBase;
	PVOID StackLimit;
	PVOID Win32StartAddress;
	PVOID TebAddress;
	ULONG Reserved1;
	ULONG Reserved2;
	ULONG Reserved3;
} SYSTEM_EXTENDED_THREAD_INFORMATION, *
PSYSTEM_EXTENDED_THREAD_INFORMATION;

typedef struct _SYSTEM_EXTENDED_PROCESS_INFORMATION
{
	ULONG NextEntryOffset;
	ULONG NumberOfThreads;
	LARGE_INTEGER SpareLi1;
	LARGE_INTEGER SpareLi2;
	LARGE_INTEGER SpareLi3;
	LARGE_INTEGER CreateTime;
	LARGE_INTEGER UserTime;
	LARGE_INTEGER KernelTime;
	UNICODE_STRING ImageName;
	KPRIORITY BasePriority;
	ULONG UniqueProcessId;
	ULONG InheritedFromUniqueProcessId;
	ULONG HandleCount;
	ULONG SessionId;
	PVOID PageDirectoryBase;
	VM_COUNTERS VirtualMemoryCounters;
	SIZE_T PrivatePageCount;
	IO_COUNTERS IoCounters;
	SYSTEM_EXTENDED_THREAD_INFORMATION Threads[1];
} SYSTEM_EXTENDED_PROCESS_INFORMATION, *PSYSTEM_EXTENDED_PROCESS_INFORMATION;

The key values we are interested in from these structures are the StackBase and StackLimit fields which give the start of a threads kernel mode stack and it’s limit.

Once more NtQuerySystemInformation won’t just tell us how much memory we need to allocate, so we call it in a loop.

ULONG len = 20;
NTSTATUS status = NULL;
PSYSTEM_EXTENDED_PROCESS_INFORMATION pProcessInfo = NULL;
do {
	len *= 2;	
	pProcessInfo = (PSYSTEM_EXTENDED_PROCESS_INFORMATION)GlobalAlloc(GMEM_ZEROINIT, len);
	status = query(SystemSessionProcessInformation, pProcessInfo, len, &len);
} while (status == (NTSTATUS)0xc0000004);

Once the function has been successfully called we can print out the StackBase and StackLimit values for every thread running on the system :D

In order to do this we iterate over all the ProcessInfo structures and then iterate over each thread contained within, printing the values of interest.

while (pProcessInfo->NextEntryOffset != NULL) {
	for (unsigned int i = 0; i < pProcessInfo->NumberOfThreads; i++) {
		PVOID stackBase = pProcessInfo->Threads[i].StackBase;
		PVOID stackLimit = pProcessInfo->Threads[i].StackLimit;
		printf("Stack base 0x%X\t", stackBase);
		printf("Stack limit 0x%X\r\n", stackLimit);
	}
	pProcessInfo = (PSYSTEM_EXTENDED_PROCESS_INFORMATION)((ULONG_PTR)pProcessInfo + pProcessInfo->NextEntryOffset);
}

Below you can see the output of this being run on Windows 7 32 bit:

The full code for this example and the changes needed to run it on 64 bit systems can be found on Github.

#Windows 8 64 bit

In order to make use these information leaks on Windows 8 64 bit, some minor changes needed to be made to each example, these are outlined below. All of these were found purely by debugging the leak code itself to work out how what was in memory and the structures were mismatched.

##SystemModuleInformation

Only two minor changes need to be made, first of all the ImageBaseAddress pointer can be found 32 bits later on in the SYSTEM_MODULE structure so a padding variable was added, we aren’t interested in what’s contained in the extra 32 bits so I didn’t bother digging into this.

typedef struct SYSTEM_MODULE {
	ULONG           Reserved1;
	ULONG           Reserved2;
#ifdef _WIN64
	ULONG		Reserved3;
#endif
	PVOID           ImageBaseAddress;

Additionally the printf statement used to print the base addresses once NtQuerySystemInformation has been called needed to be updated to print 64 bit pointers.

printf("Base Addr 0x%llx\r\n", kernelImageBase);

Once built this could be success run on Windows 8 64 bit: The final code is on Github.

##SystemHandleInformation

Literally all I needed to do for SystemHandleInformation was swap out a print statement and everything ran fine.

#ifdef _WIN64
	printf("Object 0x%llx\t", object);
#else
	printf("Object 0x%X\t", object);
#endif

Running on Windows 8 64 bit:

The final code is on Github.

##SystemLockInformation

To get SystemLockInformation working on 64 bit Windows I had to add another padding variable, there didn’t seem to be anything being put in the variable when I was testing but it might have some use I haven’t noticed. The field sizes don’t add up for it be an alignment issue.

	ULONG	Reserved2[2];
#ifdef _WIN64
	ULONG	Reserved3;
#endif

I also had to modify the printf statement that was printing the lock address to support 64 bit addresses.

#ifdef _WIN64
	printf("Lock Address 0x%llx\t", lockAddress);
#else
	printf("Lock Address 0x%X\t", lockAddress);
#endif

After this it ran fine on Windows 8 64 bit: The final code is on Github.

##SystemExtendedProcessInformation

The changes needed for SystemExtendedProcessInformation were also minimal, 128 bits of padding were added to the SYSTEM_THREAD_INFORMATION struct - this was being used for something but I have no idea what.

#ifdef _WIN64
	ULONG Reserved[4];
#endif
}SYSTEM_THREAD_INFORMATION, *PSYSTEM_THREAD_INFORMATION;

Additionally the printf statements that dealt with addresses needed to be updated as before.

#ifdef _WIN64
	printf("Stack base 0x%llx\t", stackBase);
	printf("Stack limit 0x%llx\r\n", stackLimit);
#else
	printf("Stack base 0x%X\t", stackBase);
	printf("Stack limit 0x%X\r\n", stackLimit);
#endif

With these minor changes the code ran fine on Windows 8 64 bit: The final code is on Github.

#Windows 8.1 64 bit onward

I had a slight advantage looking at this leak on Windows 8.1 as I’d read a blogpost from Alex Ionescu a while earlier and as such I knew to try running the binary in a slightly different way. In Windows Vista the idea of Integrity Levels was introduced, this causes all processes to run at one of the six integrity levels shown below. Processes with higher integrity levels can access more system resources, for example sandboxed processes are generally run at Low Integrity and have minimal access to the rest of the system. Much more detail can be found on the MSDN page I linked above.

I created a low integrity copy of cmd.exe following the instructions found here. When I tried to run any of the NtQuerySystemInformation binaries in this command prompt I’d just get the error code 0xC0000022: This is the NTSTATUS code for STATUS_ACCESS_DENIED defined as:

A process has requested access to an object but has not been granted those access rights.

However when ran in a medium integrity command prompt the binaries all ran fine: This means that an integrity level check must have been added to the function. You can check the integrity level processes are running at using procexp from SysInternals (see the last column): I started looking into what had changed in NtQuerySystemInformation between Windows 8 and 8.1 to add this check. By looking at the NtQuerySystemInformation function in IDA, I found that it relied on calling the ‘ExpQueryInformationProcess’ function. Diffing two versions of ntoskrnl.exe (using the awesome Diaphora) I found that this function had undergone significant changes between the two OS releases. Scrolling through the diffed assembly for the two implementations it didn’t take long to find a call to ‘ExIsRestrictedCaller’ which had been added, looking at the xrefs to this function it was clear it was mostly being called in ExpQuerySystemInformation as well as a few times in potentially related functions. I took a look at the function itself and my annotated assembly can be seen below. My general understanding of how this function works is the following:

  1. Checks if an unknown value passed to it in ecx is 0, if it is then returns 0
  2. Increments the reference count for the calling processes token using PsReferencePrimaryToken
  3. Reads the TokenIntegrityLevel for the calling processes token into a local variable using SeQueryInformationToken
  4. Decrements the reference count for the calling processes token using ObDereferenceObject
  5. Check if SeQueryInformationToken returned an error code, if it did then return 1
  6. If SeQueryInformationToken succeeded, compare the read token integrity level to 0x2000 which is the value for medium integrity level
  7. If the token integrity level was below 0x2000 return 1, otherwise return 0

Alex Ionescu has a properly reversed version of this function on his blog. Each time the function is called if it returns 1 then the calling function will return the error code seen earlier.

Win32k.sys System Call Information Disclosure

# Windows 7 32 bit

This issue was originally discovered by j00ru several months before he published the whitepaper and is covered in more depth in the original blog post.

The issue was that ~20 system calls within win32k.sys were defined with return values of less than 32 bits, for example VOID or USHORT, and as a result not clearing the eax register before returning. For various reasons before the calls returned kernel addresses ended up in eax and as such these were either fully leaked or partially leaked by reading eax immediately after making the call.

For example the NtUserModifyUserStartupInfoFlags fully leaked the address of the ETHREAD structure, below you can see that just before the function returns UserSessionSwitchLeaveCrit is called, this appears to load a pointer to ETHREAD into eax which isn’t modified before the function returns, leaving the address intact.

To use these system calls to leak the addresses we first need to add the standard includes and Winddi which defines some of the GDI (Graphics Device Interface) structures used by functions we’ll be calling.

#include "stdafx.h"
#include <Windows.h>
#include <Winddi.h>

I decided to call the system calls by using their user space wrappers, in this case user32.dll and gdi32.dll, as such I needed to get the offsets of the functions within the DLL’s. In order to do this I dropped the dll’s into IDA, rebased the disassembly to 0 and filtered the function list to find the functions I was targetting. The functions start addresses are the offsets into the dll’s I needed.

I picked one function which leaked ETHREAD fully, one which leaked it partially and then the same again for W32THREAD which could be leaked in a similar way.

//0x64D4B - NtUserModifyUserStartupInfoFlags
typedef DWORD(NTAPI * lNtUserModifyUserStartupInfoFlags)(DWORD Set, DWORD Flags);
//0xA2F4 - NtUserGetAsyncKeyState
typedef DWORD(NTAPI *lNtUserGetAsyncKeyState)(DWORD key);

//0x47123 - NtGdiFONTOBJ_vGetInfo
typedef VOID(NTAPI * lNtGdiFONTOBJ_vGetInfo)(FONTOBJ *pfo,ULONG cjSize,FONTINFO *pfi);
//0x47263 - NtGdiPATHOBJ_vEnumStartClipLines
typedef VOID(NTAPI * lNtGdiPATHOBJ_vEnumStartClipLines)(PATHOBJ *ppo, CLIPOBJ *pco, SURFOBJ *pso, LINEATTRS *pla);

To call these functions we first need a handle to the DLL’s they are found in, first up we get a handle to user32.dll.

HMODULE hUser32 = LoadLibraryA("user32.dll");
if (hUser32 == NULL) {
	printf("Failed to load user32");
	return 1;
}

If this is successful we can add the first functions offset to the HMODULE value to get the functions entry point and cast it to the correct type.

lNtUserGetAsyncKeyState pNtUserGetAsyncKeyState = (lNtUserGetAsyncKeyState)((DWORD_PTR)hUser32 + 0xA2F4);

We then call the function and use inline assembly to grab the value which was left in eax and print it.

pNtUserGetAsyncKeyState(20);
unsigned int ethread = 0;
__asm {
	mov ethread, eax;
}
printf("NtUserGetAsyncKeyState ETHREAD partial disclosure: 0x%X\r\n", ethread);

We then do the same again for NtUserModifyUserStartupInfoFlags.

lNtUserModifyUserStartupInfoFlags pNtUserModifyUserStartupInfoFlags = (lNtUserModifyUserStartupInfoFlags)((DWORD_PTR)hUser32 + 0x64D4B);

pNtUserModifyUserStartupInfoFlags(20, 12);
unsigned ethread_full = 0;
__asm {
	mov ethread_full, eax;
}
printf("NtUserModifyUserStartupInfoFlags ETHREAD full disclosure: 0x%X\r\n", ethread_full);

Next up we need to call the functions which disclose the W32THREAD pointer, these are both defined in gdi32.dll so we get a handle to the DLL and call the functions as before.

HMODULE hGDI32 = LoadLibraryA("gdi32.dll");
if (hGDI32 == NULL) {
	printf("Failed to load gdi32");
	return 1;
}

lNtGdiFONTOBJ_vGetInfo pNtGdiFONTOBJ_vGetInfo = (lNtGdiFONTOBJ_vGetInfo)((DWORD_PTR)hGDI32 + NtGdiFONTOBJ_vGetInfoAddress);
FONTOBJ surf = { 0 };
FONTINFO finfo = { 0 };
pNtGdiFONTOBJ_vGetInfo(&surf, 123, &finfo);

long int w32thread = 0;
__asm {
	mov w32thread, eax;
}


printf("NtGdiEngUnLockSurface W32THREAD full disclosure: 0x%X\r\n", w32thread);


lNtGdiPATHOBJ_vEnumStartClipLines pNtGdiPATHOBJ_vEnumStartClipLines = (lNtGdiPATHOBJ_vEnumStartClipLines)((DWORD_PTR)hGDI32 + 0x47263);
PATHOBJ pathobj = { 0 };
CLIPOBJ pco = { 0 }; 
SURFOBJ pso = { 0 };
LINEATTRS pla = { 0 };
pNtGdiPATHOBJ_vEnumStartClipLines(&pathobj, &pco, &pso, &pla);
w32thread = 0;
__asm {
	mov w32thread, eax;
}
printf("NtGdiPATHOBJ_vEnumStartClipLines W32THREAD full disclosure: 0x%X\r\n", w32thread);

Once the code is built and ran, we can see the addresses being leaked :D

#Windows 8 64 bit onward

The first thing that had to be done to get the code running on Windows 8 was to update the function offsets to match the new host VM’s binaries. Note that the NtGdiFONTOBJ_vGetInfo function address is missing because the function was not defined in the Windows 8 VM’s version of gdi32.

//win8, 64bit
#define NtUserModifyUserStartupInfoFlagsAddress 0x263F0
#define NtUserGetAsyncKeyStateAddress 0x3B30
#define NtGdiPATHOBJ_vEnumStartClipLinesAddress 0x67590

The second issue is that Visual Studio doesn’t support inline assembly for code targeting amd64, so I ended up adding a rather brief file named ‘asm_funcs.asm’ which consisted of the following:

_DATA SEGMENT
_DATA ENDS
_TEXT SEGMENT

PUBLIC get_rax

get_rax PROC
ret
get_rax ENDP
_TEXT ENDS
END

Effectively all this does is define a single function named ‘get_rax’ which will do nothing but return, returning the value held in rax due to the calling convention.

Some minor config changes also had to be made to the Visual Studio project to make it compile the included assembly, this essentially consists of right clicking on the project in the solution explorer, going to ‘Build Dependencies’->’Build Customizations…’ and then ticking the ‘masm’ option in the dialogue window that comes up. Elias Bachaalany has more details on this here.

This function then be imported into the main file by declaring is an external function.

extern "C" unsigned long long get_rax();

Finally the variables we were reading eax into need to changed to 64 bit sized variables and any printf statements should be updated.

The final code can be found on Github.

After all that work we can run the code on 64 bit systems and see that this issue was fixed in Windows 8! :(

This fix is referenced in the kernel section of Matt Miller’s talk at Black Hat USA 2012 on Windows 8 exploit mitigation improvements:

The method used to fix these issues was very straight forward, by looking at the versions of win32.sys from Windows 7 vs Windows 8 (shown below) we can see that the functions are now implemented in a way that all of rax is set to a new value after calling sensitive functions. For example in both the functions that leaked ETHREAD I looked at, UserSessionSwitchLeaveCrit was responsible for the leaked address being in rax/eax before returning and this has now been fixed.

NtUserGetAsyncKeyState: Windows 8 implementation on the left, Windows 7 on the right. Previously this partially leaked the ETHREAD address as only the first 16 bits of eax were being modified before the function returned, now movsx is being used which will zero out the higher bits instead of leaving them intact.

NtUserModifyUserStartupInfoFlags: Windows 8 implementation on the left, Windows 7 on the right. Previously this leaked the full ETHREAD address due to eax not being modified at all before returning, now eax is explicitly set to be 1.

Descriptor Tables

#Windows 7 32 bit

On x86 descriptor tables have various uses, the ones looked at in j00ru’s paper were the Interrupt Descriptor Table (IDT), which allows the processor to lookup what code to execute to handle interrupts and exceptions, and the Global Descriptor Table (GDT) which is used by the processor to define memory segments.

More details on descriptor tables can be found in j00ru’s paper however the key thing to note is that they play(/played) a key role in memory segmentation and privilege separation making them interesting values to access. The Global Descriptor Table Register (GDTR) defines the start address of the GDT and it’s size, it can be read using the sgdt x86 instruction, a key note from instructions documentation is the following point:

SGDT is only useful in operating-system software; however, it can be used in application programs without causing an exception to be generated.

This means that code running in Ring 3 can read the GDTR value without causing an exception but cannot write to it. The format of the GDTR is as follows, taken from the OSDEV wiki:

The Interrupt Descriptor Table Register (IDTR) defines the start address of the IDT and its size, it can be read using the sidt x86 instruction and has the same ‘handy’ feature of been callable from ring 3 as sgdt does. The format for the IDTR I used is also taken from the OSDEV wiki:

Additionally Windows allows the reading of specific entries within the GDT using the GetThreadSelectorEntry function. In j00ru’s paper he uses this to read several potentially sensitive table entries however I’ll just use it to read the Task State Segment (TSS) descriptor.

We can use inline assembly to execute the sidt instruction with a 6 byte buffer as an argument.

unsigned char idtr[6] = { 0 };
__asm {
	sidt idtr;
}

Once the idtr has been read, we just need to extract the values out of memory and print them.

unsigned int idtrBase = (unsigned int)idtr[5] << 24 |
		(unsigned int)idtr[4] << 16 |
		(unsigned int)idtr[3] << 8 |
		(unsigned int)idtr[2];
unsigned short idtrLimit = (unsigned int)idtr[1] << 8 |
		(unsigned int)idtr[0];
printf("Interrupt Descriptor Table Register base: 0x%X, limit: 0x%X\r\n", idtrBase, idtrLimit);

Again we can easily use inline assembly to call the sgdt instruction before extracting the base address and limit values :)

unsigned char gdtr[6] = { 0 };
__asm {
	sgdt gdtr;
}
unsigned int gdtrBase = (unsigned int)gdtr[5] << 24 |
		(unsigned int)gdtr[4] << 16 |
		(unsigned int)gdtr[3] << 8 |
		(unsigned int)gdtr[2];
unsigned short gdtrLimit = (unsigned int)gdtr[1] << 8 |
		(unsigned int)gdtr[0];
printf("Global Descriptor Table Register base: 0x%X, limit: 0x%X\r\n", gdtrBase, gdtrLimit);

Next we want to use GetThreadSelectorEntry to read the TSS entry. MSDN described the function as follows.

BOOL WINAPI GetThreadSelectorEntry(
  _In_  HANDLE      hThread,
  _In_  DWORD       dwSelector,
  _Out_ LPLDT_ENTRY lpSelectorEntry
);

First all we use the Store Task Register/str instruction to get the correct segment selector for the tss.

WORD tr;

__asm str tr

Next we create an empty LDT_ENTRY entry structure and call the GetThreadSelectorEntry function with the current thread passed as the thread parameter.

LDT_ENTRY tss;
GetThreadSelectorEntry(GetCurrentThread(), tr, &tss);

We can then read the TSS’s base address and limit from the now populated LDT_ENTRY structure as below.

unsigned int  tssBase = (tss.HighWord.Bits.BaseHi << 24) +
	(tss.HighWord.Bits.BaseMid << 16) +
	tss.BaseLow;
unsigned int tssLimit = tss.LimitLow;
printf("TSS base: 0x%X, limit: 0x%X\r\n", tssBase, tssLimit);

With all of this done, we can compile the code and run it to see the addresses:

The full code including the changes for 64 bit Windows can be grabbed from Github.

#Windows 8 64 bit onward

Some non-trivial changes had to be made to the code to get it to work on 64 bit Windows. Most significantly Visual Studio doesn’t support inline assembly for projects targeting amd64. Luckily for sidt/sgdt we can get around this by using some of the Compiler Intrinsics defined by Visual Studio. Reading the GDTR was replaced with the following code.

unsigned char gdtr[10] = { 0 };
_sgdt(gdtr);
unsigned long long gdtrBase = (unsigned long long)gdtr[9] << 56 |
	(unsigned long long)gdtr[8] << 48 |
	(unsigned long long)gdtr[7] << 40 |
	(unsigned long long)gdtr[6] << 32 |
	(unsigned long long)gdtr[5] << 24 |
	(unsigned long long)gdtr[4] << 16 |
	(unsigned long long)gdtr[3] << 8 |
	(unsigned long long)gdtr[2];
unsigned short gdtrLimit = (unsigned int)gdtr[1] << 8 |
	(unsigned int)gdtr[0];
printf("Global Descriptor Table Register base: 0x%llx, limit: 0x%X\r\n", gdtrBase, gdtrLimit);

The _sgdt intrinsic works identically to calling sgdt with inline assembly. The size of gdtr also had to be updated to reflect the larger pointers on 64 bit systems. The same changes needed to be made for reading the IDTR as well.

unsigned char idtr[10] = { 0 };
__sidt(idtr); 
unsigned long long idtrBase = (unsigned long long)idtr[9] << 56 |
	(unsigned long long)idtr[8] << 48 |
	(unsigned long long)idtr[7] << 40 |
	(unsigned long long)idtr[6] << 32 |
	(unsigned long long)idtr[5] << 24 |
	(unsigned long long)idtr[4] << 16 |
	(unsigned long long)idtr[3] << 8 |
	(unsigned long long)idtr[2];
unsigned short idtrLimit = (unsigned int)idtr[1] << 8 |
	(unsigned int)idtr[0];
printf("Interrupt Descriptor Table Register base: 0x%llx, limit: 0x%X\r\n", idtrBase, idtrLimit);

Finally intrin.h needs to be added to the includes as that’s where the Compiler Intrinsics are defined.

#include <intrin.h>

GetThreadSelectorEntry doesn’t seem to have a straight 64 bit implementation reading the TSS was abandoned for this.

Since the sidt/sgdt instructions being executable from Ring 3 is a feature of the amd64 instruction set as opposed to an operating system feature, these value are still readable in Windows 8: Windows 8.1: And finally Windows 10: Regardless of the integrity level the process is run at or what permissions the user has.

#Hyper-V

According to Dave Weston and Matt Miller’s Black Hat presentation on Windows 10’s exploit mitigation improvements, if Hyper-V is enabled on a system and the sidt or sgdt instructions are executed the hypervisor will trap on them and stop the values being returned.

I didn’t have a machine I could test this on though.

Win32k.sys Object Handle Addresses

#Windows 7 32 bit

Win32k is the main driver responsible for providing the functionality to output graphics to monitors, printers, etc on Windows. It maintains a per session (A session consists of all of the processes and other system objects that represent a single user’s logon session.) Handle table which stores all GDI (Graphics Device Interface) and User Handles.

In order to reduce the performance overhead of accessing this table it is mapped into all GUI processes in user space. The address of this table in user space is exported as gSharedInfo by user32.dll.

This allows the addresses of all GDI and User objects in kernel memory to be discovered from user mode. First of all we need to define the structures of what this table looks like in memory, the following structures where taken and then abused from ReactOS.

typedef struct _HANDLEENTRY {
	PVOID	phead;
	ULONG	pOwner;
	BYTE	bType;
	BYTE	bFlags;
	WORD	wUniq;
}HANDLEENTRY, *PHANDLEENTRY;

typedef struct _SERVERINFO {
	DWORD	dwSRVIFlags;
	DWORD	cHandleEntries;
	WORD	wSRVIFlags;
	WORD	wRIPPID;
	WORD	wRIPError;
}SERVERINFO, *PSERVERINFO;

typedef struct _SHAREDINFO {
	PSERVERINFO	psi;
	PHANDLEENTRY	aheList;
	ULONG		HeEntrySize; 
	ULONG_PTR	pDispInfo;
	ULONG_PTR	ulSharedDelta;
	ULONG_PTR	awmControl; 
	ULONG_PTR	DefWindowMsgs; 
	ULONG_PTR	DefWindowSpecMsgs; 
}SHAREDINFO, *PSHAREDINFO;

Next we need to get a handle to the user32 DLL and find the offset of the gSharedInfo variable.

HMODULE hUser32 = LoadLibraryA("user32.dll");
PSHAREDINFO gSharedInfo = (PSHAREDINFO)GetProcAddress(hUser32, "gSharedInfo");

Once the tables location in user space has been resolved we can just iterate over the handle table printing the kernel address of each object, its owner and the objects type.

for (unsigned int i = 0; i < gSharedInfo->psi->cHandleEntries; i++) {
	HANDLEENTRY entry = gSharedInfo->aheList[i];
	if (entry.bType != 0) { //ignore free entries
		printf("Head: 0x%X, Owner: 0x%X, Type: 0x%X\r\n", entry.phead, entry.pOwner, entry.bType);
	}
}

Here it is running on a Windows 7 32 bit machine:

#Windows 8/8.1 64 bits

A couple of minor changes were needed to port the code to 64 bits. First of all the SERVERINFO struct was expanded by 64 bits, adjusting the sizes of the dwSRVIFlags and cHandleEntries fields seemed to bring everything inline with how it looked before.

typedef struct _SERVERINFO {
#ifdef _WIN64
	UINT64 dwSRVIFlags;
	UINT64 cHandleEntries;
#else
	DWORD dwSRVIFlags;
	DWORD cHandleEntries;
#endif

As usual the printf statements which logged addresses also needed modifying to handle 64 bit pointers.

#ifdef _WIN64
	printf("Head: 0x%llx, Owner: 0x%llx, Type: 0x%X\r\n", entry.phead, entry.pOwner, entry.bType);
#else
	printf("Head: 0x%X, Owner: 0x%X, Type: 0x%X\r\n", entry.phead, entry.pOwner, entry.bType);
#endif

With these changes made, it ran successfully on Windows 8.1 :)

#Windows 10?

In Dave Weston and Matt Miller’s Black Hat USA talk they state that the GDI shared handle table no longer discloses kernel addresses. However when the binary is ran in a Windows 10 64 bit anniversary edition virtual machine, I got what looks a lot like kernel pointers: Looking at the disclosed addresses they are in the right value range to match the expected session space address range in kernel memory, at least the range that was present in Windows 7 64 bit as described by Code Machine (I couldn’t find an equivalent map for Windows 10).

Next up I loaded up a windows 8 64 bit machine with a kernel debugger attached and dumped the handle table and compared it to what I could see using the debugger. A few of the matching values are highlighter below but all the expected values from the user mode code are present.

I then did the same for the Windows 10 64 bit virtual machine I had been using.

I found that the structure of the Handle table and the values pointed at, where very much consistent with each other between the OS versions. I don’t currently have more time to dig into this and it’s likely me being confused (which isn’t hard…), so I’m leaving it as a question mark for now.