Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Runtime profiling in OpenJDK's HotSpot JVM

November 18, 2021
Roland Westrelin
Related topics:
CompilersJava
Related products:
Red Hat OpenShift

Share:

    In a previous article, I explained that OpenJDK's Just-in-Time (JIT) compiler relies heavily on speculation to achieve higher performance. This optimization technique generates code under an assumption that is expected to be correct but can't be proven certain. If the assumption turns out to be wrong, the JIT compiler prevents incorrect execution by returning control to the bytecode interpreter. This process is called deoptimization.

    This article explains how profiling at runtime can improve speculation and contribute to optimization in other ways. I'll show examples of profile data collected at runtime, explain how the JIT compiler uses the data, and illustrate the benefits of runtime profiling.

    Patterns in compiler speculation

    Speculation means that the compiler starts by making an assumption that is likely true. Sometimes, as with null checks, a reasonable assumption is almost obvious: Null checks rarely cause an exception, so speculating that they don't cause one is expected to pay off.

    A similar pattern exists for range checks. Range checks are tests that guard references to array elements in Java code to protect the program from out-of-bound accesses. In practice, an array access looks like the following pseudocode:

    int[] array; // an array of some type
    
    
    if (array == null) {
      throw new NullPointerException();
    }
    
    if (i < 0 || i >= array.length) {
      throw new ArrayIndexOutOfBoundsException();
    }
    
    int v = array[i];

    Failing range checks are usually caused by programmer errors, as is the case for null pointer exceptions. Therefore, it is not considered crucial for performance to handle a failed check efficiently. With that consideration in mind, the previous snippet can be compiled down to:

    int[] array;
    
    if (array == null) {
      deoptimize();
    }
    
    if (i < 0 || i >= array.length) {
      deoptimize();
    }
    
    int v = array[i];

    If a thread makes an out-of-bound access, it reaches the deoptimize() pseudo-call, which causes the thread's execution to exit the compiled code and resume in the interpreter at the exact same point in the Java method. Interpreted execution then throws the exception. As a consequence, the exception-throwing logic does not need to be included in the compiled code.

    Why we need runtime profiling

    Consider this example illustrating the limits of speculation:

    Object o;
    ...
    
    if (o instanceof C) {
      ...
    }

    It would be nice to have the compiler speculate here, as well, regarding whether the instanceof test will succeed. But without knowing anything else from the program, how can we know whether the test is more likely to succeed or fail? Can we guess a likely type for the o variable from this expression? A good answer to either question appears unlikely from looking at this code snippet alone.

    Using runtime data to drive our decisions is a much better solution. If we expect the test to fail, we can compile the snippet down to the following code. It falls through to the rest of the compiled code when the test fails, and returns to the interpreter when the test succeeds:

    if (o instanceof C) {
      deoptimize();
    }

    If we expect the test to succeed, we can instead insert the following code. It falls through the rest of the compiled code when the test succeeds and returns to the interpreter when the test fails:

    if (!(o instanceof C)) {
      deoptimize();
    }

    Hot methods and tiered compilation

    My previous article explained that JIT compilation is resource hungry. Because of this, it is beneficial only if resources are focused on "hot" methods, which are expected to execute often, rather than having all methods that execute at least once be highly optimized.

    As a consequence, all code is executed initially either in the interpreter or as compiled code generated at a low level of optimization. Only hot methods run at the highest level of optimization. This optimization choice is known as tiered compilation. The highest level of optimization is the high tier. For the rest of this discussion, I will designate all other levels as low tiers.

    Runtime profiling for hot methods

    Profiling helps determine when a transition to a hot method is justified. Code at low tiers is instrumented to increment counters that enable the detection of hot methods. When counters cross a threshold, the transition to highly optimized compiled code is triggered.

    For hot methods, the speed of highly optimized code is crucial. The speed of code running at low tiers is not as important, as long as methods that turn out to be hot are promoted to highly optimized code early in the program run.

    The key idea, then, is to observe what the program does when it runs at low tiers by using instrumentation and to record important observations, effectively collecting profile data. Once a method is being compiled by the high tier, the compiler can look for meaningful data points in what was collected and use them to drive speculation. Profiling slows down execution speed at the low tiers, but, carefully crafted profiling pays off overall as a slight slowdown to code running at low tiers lets us generate code at the high tier that's measurably faster. That, in turn, can result in an overall speedup of applications running on the JVM.

    Benefits of profile-guided optimization

    Profile-guided optimizations are not specific to virtual machines. Some offline compilers feature them. The way to take advantage of them is usually to have training runs driven by synthetic workloads and then feed the data to the build process that produces the production-ready binaries. As a consequence, in offline compilers, leveraging this optimization has the drawback of requiring extra work from developers. With the JVM design explained in this section, profile data is collected during the application's execution. The resulting data should be more faithful and comes at no cost to the developer.

    Example #1: Optimizing a subtype check with profile data

    What kind of profile data would make sense for the type check shown earlier? Are we only interested in the success or failure of the instanceof test? Knowing that would allow us to trim unneeded paths from the methods, which is certainly good, but we should hope for more.

    Consider the cost of instanceof (and type checks, in general). As implemented in HotSpot, the OpenJDK JVM, type checks generally require several tests and even, in some cases, a loop. Can profile data help both determine the result of the type check and improve its execution time?

    HotSpot collects the type of the object involved in a type check. In the snippet shown in the previous section, let's say that o is always observed to be of type A. Then, the snippet can be compiled down to the following (expressed as pseudocode):

    if (o.getClass() != A.class) {
      deoptimize();
    }
    
    if (C.class.isAssignableFrom(A.class)) { // this was o instanceof C
      ...
    }

    The o instanceof C check can be transformed to C.class.isAssignableFrom(A.class) because the preceding runtime check guarantees that o is of type A. Given that both A and C are constant, that second check folds at compile time and is eliminated.

    As a result, the time taken by instanceof improves and an unneeded path in the method is trimmed.

    But there's another benefit to this optimization. With the snippet compiled into the following code, o would be known to be a subclass of C following a successful test:

    if (o instanceof C) {
      deoptimize();
    }

    But if the snippet is compiled to the following, o is known more precisely to be of class A following the test:

    if (o.getClass() != A.class) {
      deoptimize();
    }

    Knowing the exact type of o will likely help us further optimize code that follows, whereas knowing simply that o is a subtype of C wouldn't. Suppose that later in the method, we have:

    if (o instanceof B) {
      ...
    }

    and B is a subtype of C. Knowing only that o is a subtype of C doesn't help with this test. Knowing that o is of class A allows this test to fold away.

    As a summary, in this case, profile data helps us to:

    • Remove an unneeded path.
    • Decrease the cost of the type check.
    • Improve the compiler's knowledge of types so that it can better optimize the surrounding code.

    This example has shown how profile data improves speculation (through the first two items in the list) and positively interacts with existing classical optimizations (through the third item).

    Example #2: Optimizing never-taken branches

    Now, what if, in the example snippet, o is seen to have either type A or B, both subtypes of C? Profile data then doesn't report a single type for o, and the transformation discussed in the previous section can't apply anymore. We would still want the never-taken branch to optimize out, though.

    In HotSpot, low tiers count the number of times a branch is taken. For conditional branches (remember that HotSpot operates on Java bytecode that can contain unconditional branches), HotSpot also counts the number of times the branch is not taken.

    This profile data, in turn, indicates whether a branch is always taken (indicated when the untaken count is zero) or never taken (indicated when the taken count is zero). This information allows the JIT to trim paths that are expected to never be executed. The optimization applies to all types of branches, not only to those checking for a subtype.

    Peeking at profile data

    As is true for most aspects of code execution in HotSpot, it's hard to observe which profile is available to the high tier and how the profile is used. The observation requires carefully crafted benchmarks to observe performance variations, or looking at the generated assembly code. One way to peek into what profile data is collected is to run the java command with the (very verbose) -XX:+PrintMethodData diagnostic option. For instance, assume we want to profile the following example:

    public class TestProfileData {
        public static void main(String[] args) {
            for (int i = 0; i < 20_000; i++) {
                test(new A());
    //            test(new B());
    //            test(new D());
            }
    
        }
    
    
        private static void test(Object o) {
            if (o instanceof C) {
                System.out.println("A subtype");
            }
        }
    
    
        private static class C {
    
        }
    
    
        private static class A extends C {
    
        }
    
    
        private static class B extends C {
    
        }
    
    
        private static class D {
    
        }
    }

    Running the code with the -XX:+PrintMethodData option produced the following output with an OpenJDK 16 build:

    $ java -XX:+UnlockDiagnosticVMOptions -XX:+PrintMethodData TestProfileData
    ...
    static TestProfileData::test(Ljava/lang/Object;)V
      interpreter_invocation_count:     7308 
      invocation_counter:               7308 
      backedge_counter:                    0 
      mdo size: 576 bytes
    
    0 fast_aload_0
    1 instanceof 16 <TestProfileData$C>
      0   bci: 1    ReceiverTypeData    count(0) nonprofiled_count(0) entries(1)
                                        'TestProfileData$A'(6797 1.00)
    4 ifeq 15
      56  bci: 4    BranchData          taken(0) displacement(88)
                                        not taken(6797)
    7 getstatic 18 <java/lang/System.out/Ljava/io/PrintStream;> 
    10 fast_aldc A subtype
    12 invokevirtual 26 <java/io/PrintStream.println(Ljava/lang/String;)V> 
      88  bci: 12   VirtualCallData     count(0) nonprofiled_count(0) entries(1)
                                        'java/io/PrintStream'(6797 1.00)
    15 return
    ...

    A long output is produced. The truncated output shown here includes data for the method performing the instanceof test. The output lists all bytecodes of the method. Below the method name and signature, counters that are used to trigger compilation appear (see lines containing interpreter_invocation_count, invocation_counter, and backedge_counter).

    Then for bytecode 1, the instanceof test, TestProfileData$A is listed as the only class seen at this bytecode (see the line containing ReceiverTypeData and the following line). Bytecode 4 is the branch that takes the result of the instanceof test as input. The branch is shown as never taken (see the line containing BranchData and the following line).

    Uncommenting line 5 of the source code, which generates code using the B type, leads to a slightly different profile:

    static TestProfileData::test(Ljava/lang/Object;)V
      interpreter_invocation_count:     6882 
      invocation_counter:               6882 
      backedge_counter:                    0 
      mdo size: 576 bytes
    
    0 fast_aload_0
    1 instanceof 19 <TestProfileData$C>
      0   bci: 1    ReceiverTypeData    count(0) nonprofiled_count(0) entries(2)
                                        'TestProfileData$B'(3186 0.50)
                                        'TestProfileData$A'(3185 0.50)
    
    4 ifeq 15
      56  bci: 4    BranchData          taken(0) displacement(88)
                                        not taken(6371)
    7 getstatic 21 <java/lang/System.out/Ljava/io/PrintStream;> 
    10 fast_aldc A subtype
    12 invokevirtual 29 <java/io/PrintStream.println(Ljava/lang/String;)V> 
      88  bci: 12   VirtualCallData     count(0) nonprofiled_count(0) entries(1)
                                        'java/io/PrintStream'(6371 1.00)
    15 return

    Two different classes are seen at the instanceof test with roughly equal counts (see the line containing ReceiverTypeData and following two lines). The branch is still never taken (see the line containing BranchData and the following line).

    Uncommenting line 6 in the source code, which generates code using the D type, yields the following:

    static TestProfileData::test(Ljava/lang/Object;)V
      interpreter_invocation_count:     8452 
      invocation_counter:               8452 
      backedge_counter:                    0 
      mdo size: 576 bytes
    
    0 fast_aload_0
    1 instanceof 22 <TestProfileData$C>
      0   bci: 1    ReceiverTypeData    count(4294964573) nonprofiled_count(0) entries(2)
                                        'TestProfileData$D'(4 1.00)
                                        'TestProfileData$A'(2723 680.75)
    4 ifeq 15
      56  bci: 4    BranchData          taken(2723) displacement(88)
                                        not taken(5445)
    7 getstatic 24 <java/lang/System.out/Ljava/io/PrintStream;> 
    10 fast_aldc A subtype
    12 invokevirtual 32 <java/io/PrintStream.println(Ljava/lang/String;)V> 
      88  bci: 12   VirtualCallData     count(0) nonprofiled_count(0) entries(1)
                                        'java/io/PrintStream'(5445 1.00)
    15 return

    It’s notable (see the line containing ReceiverTypeData and the following two lines), that only two types are recorded at the instanceof test, even though three should be seen at runtime. One type (class B in this run) was not recorded.

    The reason B is missing is that the compiler wouldn’t be able to take advantage of the third recorded type if there was one, so it makes little sense to record it. Actually, the compiler doesn’t even need the second recorded type. All it would need is some indication that more than one type was seen. The decision to collect two types at the instanceof test, when one would be good enough, is an artifact of the implementation: The same logic for type profiling is used at virtual calls, where the compiler can make use of more than one type entry in the profile data. The global count (the line containing ReceiverTypeData) is what tells the compiler that some types were not recorded.

    The branch is now reported as usually not taken, but still sometimes taken (see the line containing BranchData and following line).

    Conclusion

    This article offered a couple of examples of code patterns that are optimized by collecting profile data at runtime and relying on speculation. Profiling is such a powerful technique that the HotSpot JVM uses it in many other ways not discussed here. For instance, profiling helps determine what call is hot in a method and drives inlining, a key performance optimization. Profiling and speculation together are especially powerful, because they can dramatically improve performance while transparently adapting to the workload being run on the JVM. They also interact well with other classic optimizations.

    Last updated: January 12, 2024

    Related Posts

    • How the JIT compiler boosts Java performance in OpenJDK

    • Shenandoah garbage collection in OpenJDK 16: Concurrent reference processing

    • Collect JDK Flight Recorder events at runtime with JMC Agent

    • How the JVM uses and allocates memory

    Recent Posts

    • How to run AI models in cloud development environments

    • How Trilio secures OpenShift virtual machines and containers

    • How to implement observability with Node.js and Llama Stack

    • How to encrypt RHEL images for Azure confidential VMs

    • How to manage RHEL virtual machines with Podman Desktop

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue