JVM Internals Deep Dive

Yashrajsinh

·January 15, 2025·13 min read·Advanced

JVM Internals Deep Dive

The Java Virtual Machine is the runtime engine that executes every Java application you deploy to production. When your Spring Boot service experiences a sudden latency spike, when your batch processor runs out of memory processing large datasets, or when your microservice takes thirty seconds to warm up before handling traffic at full speed, the root cause almost always traces back to JVM behavior. Understanding how the JVM loads classes, manages memory, collects garbage, and compiles bytecode to native instructions gives you the diagnostic tools to identify and resolve these production issues rather than guessing at solutions.

This article covers the JVM internals that matter most for backend engineers running Java in production. We focus on the areas where understanding the JVM directly translates to better application performance, more accurate capacity planning, and faster incident resolution. You do not need to become a JVM engineer, but you do need to understand what happens beneath your application code to make informed decisions about memory configuration, garbage collector selection, and startup optimization.

What You Will Learn

By working through this deep dive, you will gain practical knowledge in the following areas:

The JVM architecture including the class loader subsystem, runtime data areas, and execution engine
How the class loading mechanism works including the delegation model, custom class loaders, and class initialization ordering
JVM memory areas: heap, metaspace, thread stacks, code cache, and direct memory
Garbage collection fundamentals including generational hypothesis, GC roots, and reachability analysis
Major garbage collectors: Serial, Parallel, G1, ZGC, and Shenandoah with their trade-offs
JIT compilation tiers, method inlining, escape analysis, and deoptimization
JVM flags for production tuning including heap sizing, GC selection, and diagnostic options
Monitoring and profiling tools: jcmd, jstat, jmap, async-profiler, and flight recorder
Common production issues: memory leaks, GC pauses, class loading conflicts, and native memory exhaustion
Startup optimization techniques including CDS, AOT compilation, and GraalVM native images

Prerequisites

Before diving into JVM internals, you should be comfortable with the following:

Core Java programming including classes, interfaces, generics, and exception handling as covered in the Core Java Roadmap
The Java Collections Framework and how different data structures consume memory
Basic Java concurrency concepts including threads and thread pools
Familiarity with Linux command-line tools for process inspection and monitoring
Experience deploying Java applications in containerized environments with Docker

JVM internals is an advanced topic that builds on your understanding of how Java objects are created, how references work, and how threads execute code. If you are not yet comfortable with these fundamentals, work through the prerequisite articles first.

Concept Overview

The JVM is a specification that defines how Java bytecode is loaded, verified, and executed. Multiple implementations exist (HotSpot, OpenJ9, GraalVM), but HotSpot is the reference implementation shipped with Oracle JDK and most OpenJDK distributions. When we discuss JVM internals in this article, we refer to HotSpot behavior unless otherwise noted.

The JVM architecture consists of three major subsystems:

Class Loader Subsystem — responsible for finding, loading, linking, and initializing classes from bytecode files
Runtime Data Areas — the memory regions where the JVM stores class metadata, object instances, thread execution state, and compiled code
Execution Engine — the component that interprets bytecode and compiles frequently executed methods to optimized native code through JIT compilation

These subsystems work together to provide the illusion of a simple, sequential execution model while actually performing sophisticated optimizations behind the scenes. The garbage collector runs concurrently with your application threads, the JIT compiler speculatively optimizes hot code paths, and the class loader resolves dependencies lazily to minimize startup time.

Understanding this architecture helps you diagnose production issues because every observable symptom (high latency, memory growth, slow startup, CPU spikes) maps to specific JVM subsystem behavior that you can measure and tune.

Step-by-Step Explanation

This section walks through the core implementation steps sequentially. Each step builds on the previous one, providing a clear path from foundational concepts to production-grade patterns used in enterprise applications.

Class Loading and the Delegation Model

When your application references a class for the first time, the JVM must locate the bytecode, parse it, verify its correctness, and prepare it for execution. This process is handled by the class loader subsystem, which follows a parent-delegation model.

The delegation model works as follows: when a class loader receives a request to load a class, it first delegates the request to its parent class loader. Only if the parent cannot find the class does the child attempt to load it. This ensures that core Java classes (java.lang.String, java.util.List) are always loaded by the bootstrap class loader regardless of which class loader initiated the request, preventing class identity conflicts.

The standard class loader hierarchy has three levels:

Bootstrap Class Loader — loads core Java classes from the JDK modules (java.base, java.sql, etc.)
Platform Class Loader — loads platform-specific extension classes
Application Class Loader — loads classes from the application classpath and module path

In production, you encounter class loading issues most frequently in application servers and frameworks that use custom class loaders for isolation. Spring Boot's nested JAR class loader, OSGi bundle class loaders, and servlet container class loaders all create hierarchies that can cause ClassNotFoundException or LinkageError when the same class is loaded by different class loaders.

import java.lang.management.ManagementFactory;
import java.lang.management.MemoryMXBean;
import java.lang.management.MemoryUsage;
import java.lang.management.GarbageCollectorMXBean;
import java.util.List;
 
public class JvmDiagnostics {
 
    public static void main(String[] args) {
        // Inspect class loader hierarchy
        ClassLoader appLoader = JvmDiagnostics.class.getClassLoader();
        System.out.println("Application ClassLoader: " + appLoader);
        System.out.println("Parent ClassLoader: " + appLoader.getParent());
        System.out.println("Bootstrap ClassLoader (null): " + appLoader.getParent().getParent());
 
        // Memory usage via MXBeans
        MemoryMXBean memoryBean = ManagementFactory.getMemoryMXBean();
        MemoryUsage heapUsage = memoryBean.getHeapMemoryUsage();
        MemoryUsage nonHeapUsage = memoryBean.getNonHeapMemoryUsage();
 
        System.out.println("\n--- Heap Memory ---");
        System.out.printf("Used: %d MB%n", heapUsage.getUsed() / (1024 * 1024));
        System.out.printf("Committed: %d MB%n", heapUsage.getCommitted() / (1024 * 1024));
        System.out.printf("Max: %d MB%n", heapUsage.getMax() / (1024 * 1024));
 
        System.out.println("\n--- Non-Heap Memory (Metaspace + Code Cache) ---");
        System.out.printf("Used: %d MB%n", nonHeapUsage.getUsed() / (1024 * 1024));
        System.out.printf("Committed: %d MB%n", nonHeapUsage.getCommitted() / (1024 * 1024));
 
        // GC statistics
        System.out.println("\n--- Garbage Collectors ---");
        List<GarbageCollectorMXBean> gcBeans = ManagementFactory.getGarbageCollectorMXBeans();
        for (GarbageCollectorMXBean gc : gcBeans) {
            System.out.printf("Collector: %s, Collections: %d, Time: %d ms%n",
                gc.getName(), gc.getCollectionCount(), gc.getCollectionTime());
        }
 
        // Runtime information
        System.out.println("\n--- Runtime ---");
        System.out.println("JVM: " + System.getProperty("java.vm.name"));
        System.out.println("Version: " + System.getProperty("java.vm.version"));
        System.out.println("Available processors: " + Runtime.getRuntime().availableProcessors());
 
        // Demonstrate object allocation and GC pressure
        System.out.println("\n--- Allocation Pressure Demo ---");
        long before = heapUsage.getUsed();
        byte[][] allocations = new byte[1000][];
        for (int i = 0; i < 1000; i++) {
            allocations[i] = new byte[1024]; // 1 KB each
        }
        MemoryUsage afterAlloc = memoryBean.getHeapMemoryUsage();
        System.out.printf("Allocated ~1 MB, heap grew by %d KB%n",
            (afterAlloc.getUsed() - before) / 1024);
    }
}

JVM Memory Areas

The JVM divides memory into several distinct regions, each serving a specific purpose. Understanding these regions is essential for capacity planning and diagnosing memory-related production issues.

The Heap is where all object instances and arrays are allocated. It is the largest memory region and the one managed by the garbage collector. The heap is further divided into generations based on the generational hypothesis: most objects die young. Young generation (Eden + Survivor spaces) holds newly created objects, while the old generation (Tenured space) holds objects that have survived multiple GC cycles.

Metaspace (replacing PermGen since Java 8) stores class metadata, method bytecode, constant pools, and annotation data. Unlike the old PermGen, Metaspace grows dynamically and uses native memory rather than the Java heap. In production, Metaspace growth usually indicates class loader leaks where old class loaders and their classes cannot be garbage collected because something still holds a reference to them.

Thread Stacks are per-thread memory regions that store local variables, method parameters, and return addresses for each frame in the call stack. Each thread gets a fixed-size stack (default 512 KB to 1 MB depending on the platform). StackOverflowError occurs when recursion or deeply nested calls exceed the stack size.

Code Cache stores JIT-compiled native code. When the JIT compiler optimizes a method, the resulting machine code is stored in the code cache for subsequent invocations. If the code cache fills up, the JIT compiler stops compiling new methods, and performance degrades to interpreter speed. Monitor code cache usage in long-running applications with many classes.

Direct Memory is off-heap memory allocated through ByteBuffer.allocateDirect() and used heavily by NIO channels, Netty, and other high-performance I/O frameworks. Direct memory is not managed by the garbage collector and must be explicitly freed or reclaimed when the owning ByteBuffer is collected.

Garbage Collection Fundamentals

Garbage collection is the process of automatically reclaiming memory occupied by objects that are no longer reachable from any live thread. The GC determines reachability by tracing references from a set of GC roots (thread stacks, static fields, JNI references) through the object graph. Any object not reachable from a GC root is eligible for collection.

The generational hypothesis states that most objects die young. Empirically, 90-95% of objects in typical Java applications become unreachable within milliseconds of creation. Generational collectors exploit this by dividing the heap into young and old generations and collecting the young generation frequently (minor GC) while collecting the old generation less often (major/full GC).

A minor GC in the young generation is fast because it only needs to trace live objects (which are few) and copy them to a survivor space. Dead objects are implicitly reclaimed by not being copied. A major GC in the old generation is more expensive because it must trace the entire live object graph and compact memory to eliminate fragmentation.

G1 (Garbage First) is the default collector since Java 9. It divides the heap into equal-sized regions rather than contiguous generations. G1 prioritizes collecting regions with the most garbage first (hence the name), achieving predictable pause times by limiting how much work each GC cycle performs. G1 targets a configurable pause time goal (default 200ms) and adjusts its collection strategy to meet that target.

ZGC (production-ready since Java 15) is a low-latency collector that performs almost all GC work concurrently with application threads. ZGC pauses are typically under 1 millisecond regardless of heap size, making it suitable for latency-sensitive applications. The trade-off is slightly higher CPU overhead and memory footprint compared to G1.

Shenandoah is another low-pause collector (available in some OpenJDK distributions) that achieves sub-millisecond pauses through concurrent compaction. Like ZGC, it trades throughput for latency predictability.

JIT Compilation and Runtime Optimization

The JVM starts by interpreting bytecode instruction by instruction. As it identifies frequently executed methods (hot spots), the JIT compiler translates them to optimized native machine code. This adaptive compilation strategy means Java applications get faster over time as the JIT compiler gathers profiling data and applies increasingly aggressive optimizations.

HotSpot uses a tiered compilation system with multiple levels:

Tier 0 — Interpreter (no compilation, collects profiling data)
Tier 1-3 — C1 compiler (fast compilation, moderate optimization, continues profiling)
Tier 4 — C2 compiler (slow compilation, aggressive optimization based on collected profiles)

The most impactful JIT optimizations include:

Method inlining replaces a method call with the method body, eliminating call overhead and enabling further optimizations across the inlined code. Small, frequently called methods (getters, utility functions) are prime inlining candidates.

Escape analysis determines whether an object allocated inside a method escapes that method's scope. If the object does not escape, the JIT can allocate it on the stack instead of the heap (scalar replacement), eliminating GC pressure entirely. This is why creating short-lived objects in tight loops is often free in optimized Java code.

Loop unrolling and vectorization transform loops to process multiple iterations per cycle, leveraging SIMD instructions on modern CPUs. The JIT compiler applies these automatically when it can prove the transformation is safe.

Deoptimization occurs when the JIT compiler's assumptions are invalidated at runtime. For example, if the JIT inlines a virtual method call assuming only one implementation exists, and a new class is loaded that provides a second implementation, the JIT must deoptimize the compiled code and fall back to interpretation until it can recompile with the new information.

Production JVM Tuning

Tuning the JVM for production requires understanding your application's memory allocation patterns, latency requirements, and throughput targets. Here are the most impactful configuration decisions:

// This class demonstrates how to programmatically inspect JVM settings
// that you would typically configure via command-line flags
 
public class JvmTuningReference {
 
    /*
     * Essential production JVM flags:
     *
     * Memory sizing:
     *   -Xms4g -Xmx4g          Set initial and max heap equal (avoid resize pauses)
     *   -XX:MaxMetaspaceSize=256m  Cap metaspace to detect class loader leaks early
     *   -XX:MaxDirectMemorySize=512m  Limit off-heap NIO buffers
     *
     * GC selection (choose one):
     *   -XX:+UseG1GC                  Default, good balance of throughput and latency
     *   -XX:+UseZGC                   Sub-millisecond pauses, higher CPU overhead
     *   -XX:+UseShenandoahGC          Sub-millisecond pauses (OpenJDK builds)
     *
     * G1 tuning:
     *   -XX:MaxGCPauseMillis=100      Target pause time (default 200ms)
     *   -XX:G1HeapRegionSize=16m      Region size (auto-calculated if omitted)
     *   -XX:InitiatingHeapOccupancyPercent=45  Start concurrent marking earlier
     *
     * Diagnostics:
     *   -XX:+HeapDumpOnOutOfMemoryError  Auto heap dump on OOM
     *   -XX:HeapDumpPath=/var/log/app/  Heap dump location
     *   -Xlog:gc*:file=gc.log:time     Unified GC logging (Java 9+)
     *   -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation  JIT activity
     *
     * Container awareness (Java 10+):
     *   -XX:+UseContainerSupport       Respect cgroup memory limits (default on)
     *   -XX:MaxRAMPercentage=75.0      Use 75% of container memory for heap
     *
     * Startup optimization:
     *   -XX:+UseCompressedOops         Reduce pointer size (default for heaps < 32GB)
     *   -XX:+TieredCompilation         Enable tiered JIT (default)
     *   -XX:SharedArchiveFile=app.jsa  Class Data Sharing archive
     */
 
    public static void printCurrentSettings() {
        Runtime runtime = Runtime.getRuntime();
        System.out.printf("Max heap: %d MB%n", runtime.maxMemory() / (1024 * 1024));
        System.out.printf("Total heap: %d MB%n", runtime.totalMemory() / (1024 * 1024));
        System.out.printf("Free heap: %d MB%n", runtime.freeMemory() / (1024 * 1024));
        System.out.printf("Processors: %d%n", runtime.availableProcessors());
        System.out.printf("JVM: %s %s%n",
            System.getProperty("java.vm.name"),
            System.getProperty("java.vm.version"));
    }
 
    public static void main(String[] args) {
        printCurrentSettings();
    }
}

The most common production tuning mistake is setting -Xms and -Xmx to different values. When the heap needs to grow from the initial size to the maximum, the JVM must request memory from the operating system, which can cause a pause. Setting them equal eliminates resize pauses and makes memory usage predictable for container resource limits.

For containerized deployments with Docker, always set -XX:MaxRAMPercentage rather than fixed -Xmx values. This ensures the JVM adapts to whatever memory limit the container orchestrator assigns, leaving headroom for metaspace, thread stacks, and native memory.

Real-World Use Cases

JVM internals knowledge directly impacts your ability to operate Java applications in production:

Diagnosing memory leaks — When your application's heap usage grows steadily over hours or days, you need to identify which objects are accumulating and why they are not being collected. Tools like jmap, Eclipse MAT, and async-profiler's allocation profiling mode help you trace retained objects back to the code that holds references to them.

Reducing GC pause times — When your API's p99 latency spikes periodically, GC pauses are often the cause. Switching from G1 to ZGC, tuning G1's pause time target, or reducing allocation rate through object pooling can eliminate these spikes.

Optimizing startup time — When deploying to AWS ECS with auto-scaling, slow JVM startup means new instances cannot handle traffic quickly. Class Data Sharing (CDS), application CDS (AppCDS), and GraalVM native images can reduce startup from seconds to milliseconds.

Capacity planning for containers — When sizing container memory limits, you must account for heap, metaspace, thread stacks (stack size times thread count), code cache, direct memory, and native memory used by libraries. A 4 GB container running a Java application typically needs -Xmx set to no more than 2.5-3 GB to leave room for non-heap memory.

Debugging class loading conflicts — When your application throws NoSuchMethodError or ClassCastException for classes that clearly exist, the issue is usually multiple versions of the same class loaded by different class loaders. Adding -verbose:class to JVM flags reveals which class loader loaded each class and from which JAR.

Best Practices

Follow these guidelines when operating Java applications in production:

Set heap size explicitly and make initial equal to maximum. Never rely on JVM defaults for production workloads. Setting -Xms equal to -Xmx eliminates heap resize pauses and makes memory usage predictable for container orchestrators and monitoring systems.

Enable GC logging unconditionally in production. GC logs have negligible performance impact and provide essential diagnostic data when issues occur. Use -Xlog:gc*:file=gc.log:time,uptime,level,tags on Java 11+ for structured, parseable output.

Choose your garbage collector based on your latency requirements. G1 is the right default for most applications. Switch to ZGC or Shenandoah only if you need sub-millisecond pause times and can accept the CPU overhead. Use Parallel GC only for batch workloads where throughput matters more than individual pause duration.

Monitor metaspace and code cache alongside heap. Metaspace growth indicates class loader leaks. Code cache exhaustion causes JIT compilation to stop, degrading performance to interpreter speed. Set -XX:MaxMetaspaceSize to detect leaks early rather than letting metaspace grow until the container is killed.

Always enable heap dumps on OutOfMemoryError. The flag -XX:+HeapDumpOnOutOfMemoryError costs nothing during normal operation and provides the diagnostic data you need when an OOM occurs. Without it, you must reproduce the issue to diagnose it, which may be impossible for transient production conditions.

Use container-aware JVM settings. Since Java 10, the JVM respects cgroup memory limits by default. Use -XX:MaxRAMPercentage=75.0 to set heap as a percentage of container memory, leaving 25% for non-heap usage. This is more maintainable than hardcoded -Xmx values that must change when container limits change.

Common Mistakes

These JVM-related mistakes are the most common in production Java deployments:

Setting -Xmx too close to container memory limit. The JVM uses memory beyond the heap: metaspace, thread stacks, code cache, direct buffers, and native memory from libraries. If -Xmx equals the container limit, the container will be OOM-killed by the kernel before the JVM can throw OutOfMemoryError. Leave at least 25-30% headroom.

Ignoring the warmup period. JIT compilation takes time. A freshly started JVM interprets bytecode and gradually compiles hot methods. During this warmup period (typically 30-120 seconds for complex applications), latency is higher and throughput is lower. Account for this in load balancer health checks and auto-scaling policies.

Using -XX:+UseCompressedOops with heaps larger than 32 GB. Compressed ordinary object pointers reduce memory usage by using 32-bit references instead of 64-bit. This optimization is automatically disabled for heaps larger than 32 GB. If you need more than 32 GB of heap, consider whether two smaller JVM instances would be more efficient.

Not accounting for thread stack memory. Each thread consumes stack memory (default 1 MB on 64-bit Linux). An application with 500 threads uses 500 MB of native memory just for stacks, which does not count against -Xmx. This is a common cause of container OOM kills that confuse engineers who only monitor heap usage.

Disabling tiered compilation for faster startup. Some guides recommend -XX:-TieredCompilation to reduce startup overhead. While this skips C1 compilation, it also means the C2 compiler must compile methods from scratch without profiling data, often producing worse code. Tiered compilation is almost always the right choice.

Summary

The JVM is a sophisticated runtime that provides automatic memory management, adaptive compilation, and platform independence. For production Java engineers, understanding its internals is not academic knowledge but a practical necessity for diagnosing performance issues, sizing infrastructure correctly, and making informed architectural decisions.

The key areas to master are memory layout (heap generations, metaspace, thread stacks, code cache), garbage collection (generational hypothesis, collector selection, pause time tuning), and JIT compilation (tiered compilation, inlining, escape analysis). These three subsystems interact to determine your application's throughput, latency, and resource consumption.

As you deploy Java applications to AWS infrastructure with containers and auto-scaling, JVM tuning becomes a critical operational skill. The difference between a well-tuned JVM and a default configuration can be the difference between meeting your SLA and experiencing cascading failures under load. Invest time in understanding these internals and you will be equipped to operate Java at any scale.

Intermediate7 min read

JVM Internals Deep Dive

JVM Internals Deep Dive

What You Will Learn

Prerequisites

Concept Overview

Step-by-Step Explanation

Class Loading and the Delegation Model

JVM Memory Areas

Garbage Collection Fundamentals

JIT Compilation and Runtime Optimization

Production JVM Tuning

Real-World Use Cases

Best Practices

Common Mistakes

Summary

Java Streams and Functional Style

Advanced Java for Backend Developers

Java Collections Framework Deep Dive

Related Articles

Java Streams and Functional Style

Advanced Java for Backend Developers

Java Collections Framework Deep Dive