The DEX That Only Existed in Memory
I decompiled a production application and found nothing inside.
Not nothing as in poorly written. Nothing as in absent. APK contained eight Java classes totaling 25 kilobytes. All of them belonged to a security SDK. Not one of them handled a login, a transfer, a balance check, or any behavior you would expect from an application that moves money for millions of people.
App worked fine on a phone. Its code was running. It just wasn’t in the file.
This is a record of where it was, how the hiding worked, and how the code was recovered. Every offset, every byte, every line of the extraction script is documented here. Readers should be able to reproduce the technique on any similarly protected application by the time they reach the end.
The empty room
apktool d base.apk -o apktool_base
ls apktool_base/smali/com/vendor/guard/
u.smali
u$a.smali
v.smali
w.smali
G1.smali
G2.smali
G3.smali
G4.smali
Eight files. All under com.vendor.guard, a commercial protection SDK. JADX produced the same result. dex2jar produced the same result. Every tool agreed: the DEX contained eight classes, none of which implemented application behavior.
DEX on disk: 25,256 bytes
Classes: 8
Packages: 1 (com.vendor.guard)
Business logic: 0
This wasn’t obfuscation. Obfuscation renames things. Code genuinely wasn’t in the file.
The encrypted payload
assets/ directory held two files with no recognizable format:
assets/defaultv0 38,700 bytes
assets/defaultv1 300 bytes
file command returned ASCII text, with very long lines (38700), with no line terminators. One long string. No structure, no headers, no format that any standard tool would recognize as executable code. A hex dump showed why:
xxd assets/defaultv0 | head -4
00000000: 784f 314b 6450 4168 6d35 5a45 506d 3635 xO1KdPAhm5ZEPm65
00000010: 3554 6968 6230 4247 4e4b 3961 5177 6f70 5Tihb0BGNK9aQwop
00000020: 7168 5272 3436 7967 5679 486b 6f69 5844 qhRr46ygVyHkoiXD
00000030: 4938 4370 5230 3049 614a 4179 3151 4169 I8CpR00IaJAy1QAi
No DEX magic. No binary structure. Content is base64-encoded ciphertext, AES-encrypted DEX, then encoded for safe storage as a text-safe asset. Not one byte of this is readable as bytecode.
For comparison, this is what a DEX file should look like:
xxd classes.dex | head -4
00000000: 6465 780a 3033 3700 ce6a 9b84 dca3 caef dex.037..j......
00000010: 4806 7604 36dd dc73 1b06 591c d43a c3e7 H.v.6..s..Y..:..
00000020: a862 0000 7000 0000 7856 3412 0000 0000 .b..p...xV4.....
00000030: 0000 0000 d861 0000 6e01 0000 7000 0000 .....a..n...p...
First four bytes of any DEX file are 64 65 78 0a, the ASCII string dex\n, followed by a version number (037 here) and a null terminator. defaultv0 had none of that. Whatever was inside had been encrypted and base64-encoded before being placed in the APK.
How the hiding works
Those eight classes in the APK weren’t the application. They were a four-stage bootstrap mechanism that decrypted and loaded the real application at runtime. Understanding this chain is necessary before the extraction makes sense.
Stage 1: G1, the first code to run
G1 extends AppComponentFactory, an Android API introduced in API 28. It fires before the Application object is even created, the earliest possible entry point in an Android process.
# G1.smali — instantiateClassLoader() (the first method Android calls)
.class public final Lcom/vendor/guard/G1;
.super Landroid/app/AppComponentFactory;
# This is where the native library gets loaded:
invoke-static {}, Lcom/vendor/guard/G4;->X01()V
G4.X01() loads libguard.so, a 27-megabyte native library. Once loaded, its .init_array constructors and JNI_OnLoad execute immediately, before control returns to Java.
Stage 2: G3, the static initializer
G3 is a ContentProvider. Android instantiates all declared ContentProviders before calling Application.onCreate(). Critical part isn’t onCreate() but the static class initializer <clinit>, which runs the moment the class is loaded:
# G3.smali
.method static constructor <clinit>()V
.locals 6
invoke-static {}, Lcom/vendor/guard/G2;->X03()V
return-void
.end method
One line. It calls G2.X03(), which is where the decryption and class loading happen.
Stage 3: G2, the proxy Application
G2 extends Application. It serves as a wrapper around the real application class (which doesn’t exist in the APK, it is inside the encrypted payload). Key method is X03():
# G2.smali — X03()V
# After swapping mApplication via LoadedApk reflection:
invoke-static {}, Lcom/vendor/guard/G2;->X01()V # ← NATIVE CALL
G2.X01() is declared as public static native, it is implemented in libguard.so, not in Java. This native function:
- Reads
assets/defaultv0andassets/defaultv1from the APK - Decrypts them using AES
- Passes the decrypted bytes to
dalvik.system.InMemoryDexClassLoader
Stage 4: InMemoryDexClassLoader, memory-only loading
This is the mechanism that makes the code invisible.
InMemoryDexClassLoader (Android API 26+) takes a ByteBuffer, a raw byte array in memory, and loads it as executable DEX. It doesn’t write a file. It doesn’t create a cache in /data/dalvik-cache. Decrypted bytecode exists only as a region of process memory: anonymous, unnamed, invisible to any tool that reads the filesystem.
// What libguard.so does internally (reconstructed):
byte[] decrypted = aesDecrypt(readAsset("defaultv0"), key);
ByteBuffer buffer = ByteBuffer.wrap(decrypted);
ClassLoader loader = new InMemoryDexClassLoader(buffer, parentClassLoader);
// The DEX is now loaded. It will never touch the disk.
WARNING
This is the same technique used by packed Android malware to hide payloads from antivirus scanners. The motivation is different. The mechanism is identical.
Why Frida couldn’t help
Natural approach was Frida: spawn the app, inject a script before initialization, hook the native decryption, intercept the DEX bytes in transit. A comprehensive bypass script was written, 1,131 lines covering every detection vector documented in the SDK.
frida -U -f com.target.app -l bypass.js
____
/ _ | Frida 17.6.2 - A world-class dynamic instrumentation toolkit
| (_| |
> _ | Commands:
/_/ |_|
Spawned `com.target.app`. Resuming main thread!
[Remote::com.target.app ]->
Process terminated
Connection dropped before the script completed loading.
SDK implements thirteen anti-instrumentation techniques across two native libraries. Three of them are worth understanding because they explain why code injection wasn’t viable:
DT_DEBUG link_map traversal. SDK reads the ELF dynamic linker’s internal link_map structure directly through the DT_DEBUG entry. This enumerates every shared object in the process without calling dl_iterate_phdr or any other API that Frida could intercept. Hiding a library from the standard enumeration function does nothing against direct structure traversal.
YARA in-process scan. Library embeds a full YARA engine (yr_rules_scan_proc) with encrypted pattern rules stored in assets/sdkconfig. It scans process memory for byte sequences characteristic of instrumentation frameworks. Rules aren’t readable without decrypting the config file first.
libc self-verification. SDK hooks several libc functions internally using its own wrapHook function, then calls those functions and verifies the execution path. If a Frida trampoline has been inserted at the function prologue, the verification fails and detection triggers.
1,131-line bypass script mapped the entire detection landscape but couldn’t outrun it. What this ruled out was any approach that required injecting code into the target process.
Reading memory from the outside
A process can’t hide its own memory from the kernel. This isn’t a vulnerability. It is how Linux works.
/proc/[pid]/mem is a pseudo-file that maps directly to a process’s virtual address space. Reading from offset N returns the bytes at virtual address N. Reader needs root. On a rooted Android device, that is satisfied by adb shell su -c.
After InMemoryDexClassLoader loads the decrypted DEX, the bytecode sits in an anonymous memory region. No file name. No label. But there is a property that no runtime obfuscation can remove: the DEX file format requires specific magic bytes at the start of every valid file.
The DEX header
Offset Size Field Meaning
------ ---- ------------ --------------------------------
0x00 8 magic "dex\n035\0" (or 036, 037, 038, 039)
0x08 4 checksum Adler-32 of everything past this field
0x0C 20 signature SHA-1 hash
0x20 4 file_size Total DEX file size (uint32, little-endian)
0x24 4 header_size Always 0x70 (112 bytes)
0x28 4 endian_tag 0x12345678 for little-endian
Magic is the anchor. file_size at offset 0x20 gives the exact boundary. With these two values, a complete DEX file can be carved from a raw memory dump with no additional context.
In hex, the magic looks like this:
64 65 78 0a 30 33 35 00
d e x \n 0 3 5 \0
Any byte sequence in memory that starts with 64 65 78 0a and is followed by a valid version string (035 through 039) is a candidate.
The extraction script
Dump script is 380 lines of Python. It runs on the host machine and communicates with the device through adb. Below is a walk-through of each critical function.
Finding the process
PACKAGE = sys.argv[1] if len(sys.argv) > 1 else "com.target.app"
DEX_MAGIC = b"dex\n"
def get_pid():
out, _ = adb(f"pidof {PACKAGE}")
pid = out.decode().strip()
if not pid:
return None
return pid.split()[0] # may return multiple PIDs; take the first
pidof returns the process ID. If the app has multiple processes (common with Firebase or WebView), the first PID is the main process where InMemoryDexClassLoader runs.
Parsing the memory map
def get_maps(pid):
out, _ = adb(f"cat /proc/{pid}/maps")
lines = out.decode(errors="replace").strip().split("\n")
regions = []
for line in lines:
parts = line.split()
addr_range = parts[0]
perms = parts[1]
name = " ".join(parts[5:]) if len(parts) > 5 else ""
start_str, end_str = addr_range.split("-")
start = int(start_str, 16)
end = int(end_str, 16)
regions.append({
"start": start, "end": end,
"size": end - start, "perms": perms, "name": name
})
return regions
Each line in /proc/pid/maps describes one virtual memory region. The format is:
address perms offset dev inode pathname
7a00080000-7a00f80000 rw-p 00000000 00:00 0 [anon:dalvik-...]
Permissions field matters. Only regions with r (readable) in position 0 can be dumped. Write or execute permissions are irrelevant for reading.
Filtering candidates
Not every region contains DEX. The script filters aggressively:
def is_candidate_region(region):
perms = region["perms"]
name = region["name"]
size = region["size"]
if "r" not in perms: # must be readable
return False
if size < 4096: # too small for a DEX file
return False
if size > 100 * 1024 * 1024: # larger than 100MB — not a DEX
return False
# Skip named libraries and system paths
skip_patterns = [
"/dev/", "/proc/", "/sys/", "/system/", "/vendor/", "/apex/",
"libflutter.so", "libapp.so", "libc.so", "libart.so",
"libguard.so", "librisk.so", # protection SDK libraries
".oat", ".art", "boot.art", "boot.oat",
"linker64", "vdso", "[stack",
]
for pat in skip_patterns:
if pat in name:
return False
# Accept: anonymous regions, dalvik-labeled regions, package-named regions
if not name or name.strip() == "":
return True # anonymous — top candidate
if "dalvik" in name.lower():
return True
if "anon" in name.lower():
return True
return False
Reasoning behind each filter:
Anonymous regions (empty name) are the primary target. InMemoryDexClassLoader allocates through the runtime, which produces anonymous mmap regions. They show up in /proc/pid/maps with no pathname, just an address range and permissions.
dalvik-labeled regions ([anon:dalvik-*]) are ART runtime internal allocations. Non-moving space and zygote space can contain loaded DEX data.
Named .so files are excluded because they are memory-mapped library code, not DEX. libguard.so itself is excluded, it contains the decryption engine, not the decrypted output.
Size bounds (4KB to 100MB) eliminate regions that cannot plausibly contain a complete DEX. InMemoryDexClassLoader output is typically between 1 and 20 megabytes.
Sorting by likelihood
candidates.sort(key=lambda r: (
0 if 1024*1024 <= r["size"] <= 20*1024*1024 else 1, # 1-20 MB first
0 if not r["name"] else 1, # anonymous first
r["size"], # smaller first
))
Regions between 1 and 20 megabytes are scanned first. Anonymous regions take priority over named ones. Within each tier, smaller regions come first, a heuristic that reduces time to first discovery.
Dumping memory
def dump_region_dd(pid, start, size, timeout=60):
block_size = 4096
skip_blocks = start // block_size
count_blocks = (size + block_size - 1) // block_size
cmd = f"dd if=/proc/{pid}/mem bs={block_size} skip={skip_blocks} count={count_blocks} 2>/dev/null"
full_cmd = ["adb", "shell", "su", "-c", cmd]
result = subprocess.run(full_cmd, capture_output=True, timeout=timeout)
return result.stdout
dd reads raw bytes from /proc/pid/mem. The skip parameter is the virtual address divided by the block size, which positions the read at the correct page. count is the region size rounded up to the nearest page boundary. The 2>/dev/null silences dd’s progress output, which would otherwise corrupt the binary data piped back through adb.
A subtlety: /proc/pid/mem requires page-aligned reads on most kernels. The memory map always reports page-aligned boundaries, so this is satisfied naturally. If a region were unaligned, the script rounds down, a conservative choice that may include a few extra bytes at the start but will not miss the target.
Scanning for DEX magic
This is the core of the extraction:
def find_dex_in_data(data, region_start):
dex_files = []
offset = 0
while offset < len(data) - 112: # 112 bytes = minimum DEX header
pos = data.find(DEX_MAGIC, offset) # search for b"dex\n"
if pos == -1:
break
magic8 = data[pos:pos+8]
# Validate version: must be 035, 036, 037, 038, or 039
if len(magic8) >= 8 and magic8[4:7] in [b"035", b"036", b"037", b"038", b"039"]:
if pos + 36 <= len(data):
# Read file_size from offset 32 (uint32, little-endian)
file_size = struct.unpack("<I", data[pos+32:pos+36])[0]
# Sanity: must be between 112 bytes and 100 MB
if 112 <= file_size <= 100 * 1024 * 1024:
dex_data = data[pos:pos+file_size] if pos+file_size <= len(data) else data[pos:]
vaddr = region_start + pos
dex_files.append({
"offset": pos,
"vaddr": vaddr,
"size": file_size,
"data": dex_data,
"version": magic8[4:7].decode(),
})
offset = pos + max(file_size, 8) # skip past this DEX
continue
offset = pos + 1 # false positive — advance one byte and keep scanning
return dex_files
The logic:
- Search for the four-byte sequence
dex\n(64 65 78 0a). - Validate the next three bytes as a version number (035 through 039).
- Read the
file_sizefield at offset 32 from the magic, interpret asuint32little-endian. - Sanity check the size, must be at least 112 bytes (minimum valid DEX) and at most 100 megabytes (practical upper bound).
- Extract exactly
file_sizebytes starting from the magic position. - Advance past the extracted DEX and continue scanning for additional files in the same region.
Step 6 matters because InMemoryDexClassLoader can load multiple DEX files, and they may end up in the same or adjacent memory regions.
The race condition
Unpatched app was running on a rooted emulator. Root detection, emulator detection, and integrity verification were active on background threads. Process would eventually kill itself.
for i, region in enumerate(candidates):
# Check if process is still alive every 5 regions
if i > 0 and i % 5 == 0:
alive_out, _ = adb(f"kill -0 {pid} && echo ALIVE")
if b"ALIVE" not in alive_out:
print(f"[!] Process died after scanning {i} regions!")
break
kill -0 sends signal 0 to the process, a no-op that succeeds only if the process exists. Checking every five regions balances speed against the cost of making an extra adb call. If the process dies mid-scan, the script saves whatever it has found so far and exits.
In practice, the process survived long enough. Scan completed before security checks reached their verdict.
What came out
python3 dump_dex.py
[+] Found PID: 29679
[*] Reading memory maps...
[+] Found 567 memory regions
[+] 567 candidate regions, total 432.7 MB to scan
[1/567] 0x0000723db000 (6.7 MB) rw-p <anonymous>
[DEX] Found at vaddr 0x723db000, version 035, size 8119560 bytes (7929.3 KB)
[2/567] 0x007a00080000 (15.0 MB) rw-p <anonymous>
[DEX] Found at vaddr 0x7a00080000, version 037, size 9199248 bytes (8983.6 KB)
[*] Scan complete: 2 DEX files found
[+] Saved classes_0.dex: version=035, size=8119560 bytes
[+] Saved classes_1.dex: version=037, size=9199248 bytes
Two DEX files, recovered from anonymous memory regions:
classes_0.dex
DEX version: 035
Size: 8,119,560 bytes (7.9 MB)
Virtual addr: 0x723db000
Contents: Flutter plugins, Firebase, WebView, HTTP clients,
all UI activities, all API endpoint handlers
classes_1.dex
DEX version: 037
Size: 9,199,248 bytes (8.8 MB)
Virtual addr: 0x7a00080000
Contents: risk assessment SDK, guard runtime classes,
Kotlin stdlib, kill chain handlers,
emulator detection UI logic
For scale:
Visible on disk: 25,256 bytes 8 classes
Recovered from memory: 17,318,808 bytes thousands of classes
Ratio: 686 : 1
Verification
Extracted files decompile cleanly:
baksmali d dex_dump/classes_0.dex -o dex_dump/smali_0
baksmali d dex_dump/classes_1.dex -o dex_dump/smali_1
jadx dex_dump/classes_0.dex dex_dump/classes_1.dex -d dex_dump/jadx_out
A quick header check confirms the files are structurally valid:
xxd dex_dump/classes_0.dex | head -4
00000000: 6465 780a 3033 3500 cf80 0e9f df41 9035 dex.035......A.5
00000010: 5e51 f5da 6b02 68ca e30e f33b efa9 db08 ^Q..k.h....;....
00000020: 08e5 7b00 7000 0000 7856 3412 0000 0000 ..{.p...xV4.....
00000030: 0000 0000 2ce4 7b00 64d8 0000 7000 0000 ....,.{.d...p...
Magic dex.035 is at offset 0x00. At offset 0x20, the file_size field reads 08 e5 7b 00, little-endian for 0x007be508, which is 8,119,560 bytes. That matches the file on disk exactly. Bytecode wasn’t obfuscated. Encryption had been the only protective layer. Once past it, the application’s internal structure was fully readable.
What was inside the hidden code
Among the recovered classes, five methods across four files implemented the application’s self-termination mechanism:
# com/vendor/risk/core/UiUtil.smali
.method public static exitApp(Landroid/content/Context;)V
invoke-static {p0}, Lcom/vendor/sdk/b9;->c(Landroid/content/Context;)V
const/4 v0, 0x0
invoke-static {v0}, Ljava/lang/System;->exit(I)V # kill
return-void
.end method
# com/vendor/sdk/b9.smali — method c(), two locations:
const/4 v0, 0x0
invoke-static {v0}, Ljava/lang/System;->exit(I)V # kill
# com/vendor/sdk/bg.smali — method a():
invoke-static {}, Landroid/os/Process;->myPid()I
move-result v0
invoke-static {v0}, Landroid/os/Process;->killProcess(I)V # kill
# kotlin/system/ProcessKt.smali:
invoke-static {p0}, Ljava/lang/System;->exit(I)V # kill
Five System.exit() calls. One Process.killProcess(). Ten additional methods handling emulator detection dialogs. None of this was visible in the original APK. It existed only inside the encrypted payload, which existed only in memory after decryption.
On boundaries
There is a precise limit to what application-layer encryption can achieve on a system where the operator holds root.
InMemoryDexClassLoader was built to avoid writing DEX to disk, and it succeeds. But it can’t avoid placing the DEX in addressable memory, because the ART runtime must read the bytecode to execute it. And it can’t prevent the kernel from exposing that memory through /proc, because the kernel doesn’t answer to the application.
Encrypted payload decrypts itself because it has no other choice. Once decrypted, it is legible to anyone with the privilege to read process memory.
This isn’t a flaw. It is the boundary condition of the design, the point where application-level protection meets kernel-level authority. Defense raises the cost of analysis and defeats the most common tooling. It doesn’t prevent extraction by a reader who operates below the application layer.
Reproducibility
Host: macOS, Apple Silicon
Device: Android emulator, arm64, API 34, Magisk root
Tools: apktool 3.0.1, jadx, baksmali 2.5.2, adb, Python 3.12
Script: dump_dex.py (380 lines)
Technique: /proc/pid/mem scan for DEX magic bytes
Time: under 30 seconds from app launch to extraction complete
Technique generalizes to any application that uses InMemoryDexClassLoader or equivalent runtime DEX loading. This includes most applications protected by commercial packing SDKs that encrypt DEX at rest and load it from memory at runtime.
Script, memory map data, and candidate regions are available for reference. Everything described here was performed with open-source tools and standard Linux interfaces.
A twenty-five-kilobyte file pretended to be an entire production application. Behind it, sixteen megabytes of hidden code handled every login, every transfer, every API call. Encryption held against static analysis. It held against Frida. It didn’t hold against the kernel’s own interface for reading what a process contains.
Code was always running. It was in a place where nobody was expected to look.
NOTE
Application names, package identifiers, SDK vendor names, and other attributes that could identify the target have been redacted. The technical substance, offsets, byte values, code structure, and extraction methodology, is unaltered.
References
- DexHunter: Toward Extracting Hidden Code from Packed Android Applications — Zhang et al., ESORICS 2015
- PackerGrind: An Adaptive Unpacking System for Android Apps — Xue et al., IEEE TIFS 2021
- dalvik.system.InMemoryDexClassLoader — Android API reference
- proc(5) — Linux man page