The DEX That Only Existed in Memory

I decompiled a production application and found nothing inside.

Not nothing as in poorly written. Nothing as in absent. APK contained eight Java classes totaling 25 kilobytes. All of them belonged to a security SDK. Not one of them handled a login, a transfer, a balance check, or any behavior you would expect from an application that moves money for millions of people.

App worked fine on a phone. Its code was running. It just wasn’t in the file.

This is a record of where it was, how the hiding worked, and how the code was recovered. Every offset, every byte, every line of the extraction script is documented here. Readers should be able to reproduce the technique on any similarly protected application by the time they reach the end.

The empty room

apktool d base.apk -o apktool_base
ls apktool_base/smali/com/vendor/guard/

u.smali
u$a.smali
v.smali
w.smali
G1.smali
G2.smali
G3.smali
G4.smali

Eight files. All under com.vendor.guard, a commercial protection SDK. JADX produced the same result. dex2jar produced the same result. Every tool agreed: the DEX contained eight classes, none of which implemented application behavior.

DEX on disk:    25,256 bytes
Classes:        8
Packages:       1 (com.vendor.guard)
Business logic: 0

This wasn’t obfuscation. Obfuscation renames things. Code genuinely wasn’t in the file.

The encrypted payload

assets/ directory held two files with no recognizable format:

assets/defaultv0   38,700 bytes
assets/defaultv1      300 bytes

file command returned ASCII text, with very long lines (38700), with no line terminators. One long string. No structure, no headers, no format that any standard tool would recognize as executable code. A hex dump showed why:

xxd assets/defaultv0 | head -4

00000000: 784f 314b 6450 4168 6d35 5a45 506d 3635  xO1KdPAhm5ZEPm65
00000010: 3554 6968 6230 4247 4e4b 3961 5177 6f70  5Tihb0BGNK9aQwop
00000020: 7168 5272 3436 7967 5679 486b 6f69 5844  qhRr46ygVyHkoiXD
00000030: 4938 4370 5230 3049 614a 4179 3151 4169  I8CpR00IaJAy1QAi

No DEX magic. No binary structure. Content is base64-encoded ciphertext, AES-encrypted DEX, then encoded for safe storage as a text-safe asset. Not one byte of this is readable as bytecode.

For comparison, this is what a DEX file should look like:

xxd classes.dex | head -4

00000000: 6465 780a 3033 3700 ce6a 9b84 dca3 caef  dex.037..j......
00000010: 4806 7604 36dd dc73 1b06 591c d43a c3e7  H.v.6..s..Y..:..
00000020: a862 0000 7000 0000 7856 3412 0000 0000  .b..p...xV4.....
00000030: 0000 0000 d861 0000 6e01 0000 7000 0000  .....a..n...p...

First four bytes of any DEX file are 64 65 78 0a, the ASCII string dex\n, followed by a version number (037 here) and a null terminator. defaultv0 had none of that. Whatever was inside had been encrypted and base64-encoded before being placed in the APK.

How the hiding works

Those eight classes in the APK weren’t the application. They were a four-stage bootstrap mechanism that decrypted and loaded the real application at runtime. Understanding this chain is necessary before the extraction makes sense.

G1AppComponentFactory

loads →

libguard.so27 MB native

returns to →

Android lifecycleinstantiates CPs

G3<clinit>

G2proxy Application

AES Decryptdefaultv0 + defaultv1

decrypted bytes →

InMemoryDexClassLoaderDEX in memory

Stage 1: G1, the first code to run

G1 extends AppComponentFactory, an Android API introduced in API 28. It fires before the Application object is even created, the earliest possible entry point in an Android process.

# G1.smali — instantiateClassLoader() (the first method Android calls)
.class public final Lcom/vendor/guard/G1;
.super Landroid/app/AppComponentFactory;

# This is where the native library gets loaded:
invoke-static {}, Lcom/vendor/guard/G4;->X01()V

G4.X01() loads libguard.so, a 27-megabyte native library. Once loaded, its .init_array constructors and JNI_OnLoad execute immediately, before control returns to Java.

Stage 2: G3, the static initializer

G3 is a ContentProvider. Android instantiates all declared ContentProviders before calling Application.onCreate(). Critical part isn’t onCreate() but the static class initializer <clinit>, which runs the moment the class is loaded:

# G3.smali
.method static constructor <clinit>()V
    .locals 6
    invoke-static {}, Lcom/vendor/guard/G2;->X03()V
    return-void
.end method

One line. It calls G2.X03(), which is where the decryption and class loading happen.

Stage 3: G2, the proxy Application

G2 extends Application. It serves as a wrapper around the real application class (which doesn’t exist in the APK, it is inside the encrypted payload). Key method is X03():

# G2.smali — X03()V
# After swapping mApplication via LoadedApk reflection:

invoke-static {}, Lcom/vendor/guard/G2;->X01()V    # ← NATIVE CALL

G2.X01() is declared as public static native, it is implemented in libguard.so, not in Java. This native function:

Reads assets/defaultv0 and assets/defaultv1 from the APK
Decrypts them using AES
Passes the decrypted bytes to dalvik.system.InMemoryDexClassLoader

Stage 4: InMemoryDexClassLoader, memory-only loading

This is the mechanism that makes the code invisible.

InMemoryDexClassLoader (Android API 26+) takes a ByteBuffer, a raw byte array in memory, and loads it as executable DEX. It doesn’t write a file. It doesn’t create a cache in /data/dalvik-cache. Decrypted bytecode exists only as a region of process memory: anonymous, unnamed, invisible to any tool that reads the filesystem.

// What libguard.so does internally (reconstructed):
byte[] decrypted = aesDecrypt(readAsset("defaultv0"), key);
ByteBuffer buffer = ByteBuffer.wrap(decrypted);
ClassLoader loader = new InMemoryDexClassLoader(buffer, parentClassLoader);
// The DEX is now loaded. It will never touch the disk.

WARNING

This is the same technique used by packed Android malware to hide payloads from antivirus scanners. The motivation is different. The mechanism is identical.

Why Frida couldn’t help

Natural approach was Frida: spawn the app, inject a script before initialization, hook the native decryption, intercept the DEX bytes in transit. A comprehensive bypass script was written, 1,131 lines covering every detection vector documented in the SDK.

frida -U -f com.target.app -l bypass.js

     ____
    / _  |   Frida 17.6.2 - A world-class dynamic instrumentation toolkit
   | (_| |
    > _  |   Commands:
   /_/ |_|

Spawned `com.target.app`. Resuming main thread!
[Remote::com.target.app ]->
Process terminated

Connection dropped before the script completed loading.

SDK implements thirteen anti-instrumentation techniques across two native libraries. Three of them are worth understanding because they explain why code injection wasn’t viable:

DT_DEBUG link_map traversal. SDK reads the ELF dynamic linker’s internal link_map structure directly through the DT_DEBUG entry. This enumerates every shared object in the process without calling dl_iterate_phdr or any other API that Frida could intercept. Hiding a library from the standard enumeration function does nothing against direct structure traversal.

YARA in-process scan. Library embeds a full YARA engine (yr_rules_scan_proc) with encrypted pattern rules stored in assets/sdkconfig. It scans process memory for byte sequences characteristic of instrumentation frameworks. Rules aren’t readable without decrypting the config file first.

libc self-verification. SDK hooks several libc functions internally using its own wrapHook function, then calls those functions and verifies the execution path. If a Frida trampoline has been inserted at the function prologue, the verification fails and detection triggers.

1,131-line bypass script mapped the entire detection landscape but couldn’t outrun it. What this ruled out was any approach that required injecting code into the target process.

Reading memory from the outside

A process can’t hide its own memory from the kernel. This isn’t a vulnerability. It is how Linux works.

/proc/[pid]/mem is a pseudo-file that maps directly to a process’s virtual address space. Reading from offset N returns the bytes at virtual address N. Reader needs root. On a rooted Android device, that is satisfied by adb shell su -c.

After InMemoryDexClassLoader loads the decrypted DEX, the bytecode sits in an anonymous memory region. No file name. No label. But there is a property that no runtime obfuscation can remove: the DEX file format requires specific magic bytes at the start of every valid file.

The DEX header

Offset  Size  Field         Meaning
------  ----  ------------  --------------------------------
0x00    8     magic         "dex\n035\0" (or 036, 037, 038, 039)
0x08    4     checksum      Adler-32 of everything past this field
0x0C    20    signature     SHA-1 hash
0x20    4     file_size     Total DEX file size (uint32, little-endian)
0x24    4     header_size   Always 0x70 (112 bytes)
0x28    4     endian_tag    0x12345678 for little-endian

Magic is the anchor. file_size at offset 0x20 gives the exact boundary. With these two values, a complete DEX file can be carved from a raw memory dump with no additional context.

In hex, the magic looks like this:

64 65 78 0a 30 33 35 00
d  e  x  \n 0  3  5  \0

Any byte sequence in memory that starts with 64 65 78 0a and is followed by a valid version string (035 through 039) is a candidate.

The extraction script

Dump script is 380 lines of Python. It runs on the host machine and communicates with the device through adb. Below is a walk-through of each critical function.

Finding the process

PACKAGE = sys.argv[1] if len(sys.argv) > 1 else "com.target.app"
DEX_MAGIC = b"dex\n"

def get_pid():
    out, _ = adb(f"pidof {PACKAGE}")
    pid = out.decode().strip()
    if not pid:
        return None
    return pid.split()[0]  # may return multiple PIDs; take the first

pidof returns the process ID. If the app has multiple processes (common with Firebase or WebView), the first PID is the main process where InMemoryDexClassLoader runs.

Parsing the memory map

def get_maps(pid):
    out, _ = adb(f"cat /proc/{pid}/maps")
    lines = out.decode(errors="replace").strip().split("\n")
    regions = []
    for line in lines:
        parts = line.split()
        addr_range = parts[0]
        perms = parts[1]
        name = " ".join(parts[5:]) if len(parts) > 5 else ""
        start_str, end_str = addr_range.split("-")
        start = int(start_str, 16)
        end = int(end_str, 16)
        regions.append({
            "start": start, "end": end,
            "size": end - start, "perms": perms, "name": name
        })
    return regions

Each line in /proc/pid/maps describes one virtual memory region. The format is:

address           perms offset dev   inode  pathname
7a00080000-7a00f80000 rw-p 00000000 00:00 0  [anon:dalvik-...]

Permissions field matters. Only regions with r (readable) in position 0 can be dumped. Write or execute permissions are irrelevant for reading.

Filtering candidates

Not every region contains DEX. The script filters aggressively:

def is_candidate_region(region):
    perms = region["perms"]
    name = region["name"]
    size = region["size"]

    if "r" not in perms:           # must be readable
        return False
    if size < 4096:                # too small for a DEX file
        return False
    if size > 100 * 1024 * 1024:   # larger than 100MB — not a DEX
        return False

    # Skip named libraries and system paths
    skip_patterns = [
        "/dev/", "/proc/", "/sys/", "/system/", "/vendor/", "/apex/",
        "libflutter.so", "libapp.so", "libc.so", "libart.so",
        "libguard.so", "librisk.so",    # protection SDK libraries
        ".oat", ".art", "boot.art", "boot.oat",
        "linker64", "vdso", "[stack",
    ]
    for pat in skip_patterns:
        if pat in name:
            return False

    # Accept: anonymous regions, dalvik-labeled regions, package-named regions
    if not name or name.strip() == "":
        return True                 # anonymous — top candidate
    if "dalvik" in name.lower():
        return True
    if "anon" in name.lower():
        return True
    return False

Reasoning behind each filter:

Anonymous regions (empty name) are the primary target. InMemoryDexClassLoader allocates through the runtime, which produces anonymous mmap regions. They show up in /proc/pid/maps with no pathname, just an address range and permissions.

dalvik-labeled regions ([anon:dalvik-*]) are ART runtime internal allocations. Non-moving space and zygote space can contain loaded DEX data.

Named .so files are excluded because they are memory-mapped library code, not DEX. libguard.so itself is excluded, it contains the decryption engine, not the decrypted output.

Size bounds (4KB to 100MB) eliminate regions that cannot plausibly contain a complete DEX. InMemoryDexClassLoader output is typically between 1 and 20 megabytes.

Sorting by likelihood

candidates.sort(key=lambda r: (
    0 if 1024*1024 <= r["size"] <= 20*1024*1024 else 1,  # 1-20 MB first
    0 if not r["name"] else 1,                            # anonymous first
    r["size"],                                            # smaller first
))

Regions between 1 and 20 megabytes are scanned first. Anonymous regions take priority over named ones. Within each tier, smaller regions come first, a heuristic that reduces time to first discovery.

Dumping memory

def dump_region_dd(pid, start, size, timeout=60):
    block_size = 4096
    skip_blocks = start // block_size
    count_blocks = (size + block_size - 1) // block_size
    cmd = f"dd if=/proc/{pid}/mem bs={block_size} skip={skip_blocks} count={count_blocks} 2>/dev/null"
    full_cmd = ["adb", "shell", "su", "-c", cmd]
    result = subprocess.run(full_cmd, capture_output=True, timeout=timeout)
    return result.stdout

dd reads raw bytes from /proc/pid/mem. The skip parameter is the virtual address divided by the block size, which positions the read at the correct page. count is the region size rounded up to the nearest page boundary. The 2>/dev/null silences dd’s progress output, which would otherwise corrupt the binary data piped back through adb.

A subtlety: /proc/pid/mem requires page-aligned reads on most kernels. The memory map always reports page-aligned boundaries, so this is satisfied naturally. If a region were unaligned, the script rounds down, a conservative choice that may include a few extra bytes at the start but will not miss the target.

Scanning for DEX magic

This is the core of the extraction:

def find_dex_in_data(data, region_start):
    dex_files = []
    offset = 0
    while offset < len(data) - 112:          # 112 bytes = minimum DEX header
        pos = data.find(DEX_MAGIC, offset)    # search for b"dex\n"
        if pos == -1:
            break

        magic8 = data[pos:pos+8]
        # Validate version: must be 035, 036, 037, 038, or 039
        if len(magic8) >= 8 and magic8[4:7] in [b"035", b"036", b"037", b"038", b"039"]:

            if pos + 36 <= len(data):
                # Read file_size from offset 32 (uint32, little-endian)
                file_size = struct.unpack("<I", data[pos+32:pos+36])[0]

                # Sanity: must be between 112 bytes and 100 MB
                if 112 <= file_size <= 100 * 1024 * 1024:
                    dex_data = data[pos:pos+file_size] if pos+file_size <= len(data) else data[pos:]
                    vaddr = region_start + pos

                    dex_files.append({
                        "offset": pos,
                        "vaddr": vaddr,
                        "size": file_size,
                        "data": dex_data,
                        "version": magic8[4:7].decode(),
                    })

                    offset = pos + max(file_size, 8)  # skip past this DEX
                    continue

        offset = pos + 1  # false positive — advance one byte and keep scanning

    return dex_files

The logic:

Search for the four-byte sequence dex\n (64 65 78 0a).
Validate the next three bytes as a version number (035 through 039).
Read the file_size field at offset 32 from the magic, interpret as uint32 little-endian.
Sanity check the size, must be at least 112 bytes (minimum valid DEX) and at most 100 megabytes (practical upper bound).
Extract exactly file_size bytes starting from the magic position.
Advance past the extracted DEX and continue scanning for additional files in the same region.

Step 6 matters because InMemoryDexClassLoader can load multiple DEX files, and they may end up in the same or adjacent memory regions.

The race condition

Unpatched app was running on a rooted emulator. Root detection, emulator detection, and integrity verification were active on background threads. Process would eventually kill itself.

for i, region in enumerate(candidates):
    # Check if process is still alive every 5 regions
    if i > 0 and i % 5 == 0:
        alive_out, _ = adb(f"kill -0 {pid} && echo ALIVE")
        if b"ALIVE" not in alive_out:
            print(f"[!] Process died after scanning {i} regions!")
            break

kill -0 sends signal 0 to the process, a no-op that succeeds only if the process exists. Checking every five regions balances speed against the cost of making an extra adb call. If the process dies mid-scan, the script saves whatever it has found so far and exits.

In practice, the process survived long enough. Scan completed before security checks reached their verdict.

What came out

python3 dump_dex.py

[+] Found PID: 29679
[*] Reading memory maps...
[+] Found 567 memory regions
[+] 567 candidate regions, total 432.7 MB to scan

[1/567] 0x0000723db000 (6.7 MB) rw-p <anonymous>
    [DEX] Found at vaddr 0x723db000, version 035, size 8119560 bytes (7929.3 KB)
[2/567] 0x007a00080000 (15.0 MB) rw-p <anonymous>
    [DEX] Found at vaddr 0x7a00080000, version 037, size 9199248 bytes (8983.6 KB)

[*] Scan complete: 2 DEX files found

[+] Saved classes_0.dex: version=035, size=8119560 bytes
[+] Saved classes_1.dex: version=037, size=9199248 bytes

Two DEX files, recovered from anonymous memory regions:

classes_0.dex
  DEX version:  035
  Size:         8,119,560 bytes (7.9 MB)
  Virtual addr: 0x723db000
  Contents:     Flutter plugins, Firebase, WebView, HTTP clients,
                all UI activities, all API endpoint handlers

classes_1.dex
  DEX version:  037
  Size:         9,199,248 bytes (8.8 MB)
  Virtual addr: 0x7a00080000
  Contents:     risk assessment SDK, guard runtime classes,
                Kotlin stdlib, kill chain handlers,
                emulator detection UI logic

For scale:

Visible on disk:          25,256 bytes       8 classes
Recovered from memory:    17,318,808 bytes   thousands of classes
Ratio:                    686 : 1

Verification

Extracted files decompile cleanly:

baksmali d dex_dump/classes_0.dex -o dex_dump/smali_0
baksmali d dex_dump/classes_1.dex -o dex_dump/smali_1
jadx dex_dump/classes_0.dex dex_dump/classes_1.dex -d dex_dump/jadx_out

A quick header check confirms the files are structurally valid:

xxd dex_dump/classes_0.dex | head -4

00000000: 6465 780a 3033 3500 cf80 0e9f df41 9035  dex.035......A.5
00000010: 5e51 f5da 6b02 68ca e30e f33b efa9 db08  ^Q..k.h....;....
00000020: 08e5 7b00 7000 0000 7856 3412 0000 0000  ..{.p...xV4.....
00000030: 0000 0000 2ce4 7b00 64d8 0000 7000 0000  ....,.{.d...p...

Magic dex.035 is at offset 0x00. At offset 0x20, the file_size field reads 08 e5 7b 00, little-endian for 0x007be508, which is 8,119,560 bytes. That matches the file on disk exactly. Bytecode wasn’t obfuscated. Encryption had been the only protective layer. Once past it, the application’s internal structure was fully readable.

What was inside the hidden code

Among the recovered classes, five methods across four files implemented the application’s self-termination mechanism:

# com/vendor/risk/core/UiUtil.smali
.method public static exitApp(Landroid/content/Context;)V
    invoke-static {p0}, Lcom/vendor/sdk/b9;->c(Landroid/content/Context;)V
    const/4 v0, 0x0
    invoke-static {v0}, Ljava/lang/System;->exit(I)V    # kill
    return-void
.end method

# com/vendor/sdk/b9.smali — method c(), two locations:
    const/4 v0, 0x0
    invoke-static {v0}, Ljava/lang/System;->exit(I)V    # kill

# com/vendor/sdk/bg.smali — method a():
    invoke-static {}, Landroid/os/Process;->myPid()I
    move-result v0
    invoke-static {v0}, Landroid/os/Process;->killProcess(I)V    # kill

# kotlin/system/ProcessKt.smali:
    invoke-static {p0}, Ljava/lang/System;->exit(I)V    # kill

Five System.exit() calls. One Process.killProcess(). Ten additional methods handling emulator detection dialogs. None of this was visible in the original APK. It existed only inside the encrypted payload, which existed only in memory after decryption.

On boundaries

There is a precise limit to what application-layer encryption can achieve on a system where the operator holds root.

InMemoryDexClassLoader was built to avoid writing DEX to disk, and it succeeds. But it can’t avoid placing the DEX in addressable memory, because the ART runtime must read the bytecode to execute it. And it can’t prevent the kernel from exposing that memory through /proc, because the kernel doesn’t answer to the application.

Encrypted payload decrypts itself because it has no other choice. Once decrypted, it is legible to anyone with the privilege to read process memory.

This isn’t a flaw. It is the boundary condition of the design, the point where application-level protection meets kernel-level authority. Defense raises the cost of analysis and defeats the most common tooling. It doesn’t prevent extraction by a reader who operates below the application layer.

Reproducibility

Host:       macOS, Apple Silicon
Device:     Android emulator, arm64, API 34, Magisk root
Tools:      apktool 3.0.1, jadx, baksmali 2.5.2, adb, Python 3.12
Script:     dump_dex.py (380 lines)
Technique:  /proc/pid/mem scan for DEX magic bytes
Time:       under 30 seconds from app launch to extraction complete

Technique generalizes to any application that uses InMemoryDexClassLoader or equivalent runtime DEX loading. This includes most applications protected by commercial packing SDKs that encrypt DEX at rest and load it from memory at runtime.

Script, memory map data, and candidate regions are available for reference. Everything described here was performed with open-source tools and standard Linux interfaces.

A twenty-five-kilobyte file pretended to be an entire production application. Behind it, sixteen megabytes of hidden code handled every login, every transfer, every API call. Encryption held against static analysis. It held against Frida. It didn’t hold against the kernel’s own interface for reading what a process contains.

Code was always running. It was in a place where nobody was expected to look.

NOTE

Application names, package identifiers, SDK vendor names, and other attributes that could identify the target have been redacted. The technical substance, offsets, byte values, code structure, and extraction methodology, is unaltered.

References

DexHunter: Toward Extracting Hidden Code from Packed Android Applications — Zhang et al., ESORICS 2015
PackerGrind: An Adaptive Unpacking System for Android Apps — Xue et al., IEEE TIFS 2021
dalvik.system.InMemoryDexClassLoader — Android API reference
proc(5) — Linux man page