Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Range check elimination in loops in OpenJDK's HotSpot JVM

March 16, 2022
Roland Westrelin
Related topics:
Java
Related products:
Red Hat OpenShift

Share:

In Java, as I discussed in a previous article on Red Hat Developer, every array access is guarded by a range check, a test that guards references to array elements and protects the program from out-of-bound accesses by throwing an ArrayIndexOutOfBoundsException. In this article, you'll see how HotSpot, OpenJDK's Java Virtual Machine, can improve application performance by cleverly transforming code to eliminate the need for range checks.

Note: This article uses interval notation, which consists of parentheses and brackets, to describe ranges. If you're unfamiliar with it, check out the table on interval notation in this document.

Why range checks matter for performance

Code where your application spends much of its time, called hot code, is just-in-time compiled in HotSpot. Given that out-of-bounds accesses are usually programmer errors, HotSpot compiles the bound check for an array[i] read or write down to:

if (i < 0 || i >= array.length) {
  deoptimize();
}

In other words, if the check fails, thread execution switches to the interpreter rather than continuing in compiled code. The interpreter re-executes the bound check and throws the exception.

The two comparisons of the bound check can be expressed as a single unsigned one:

if (i >=u array.length) {
  deoptimize();
}

Indeed, a negative i, once converted into an unsigned number, is an integer larger than the maximum signed integer.

Most array accesses are performed in loops. As a result, array range checks are frequently executed and finding ways to decrease their cumulative cost is particularly important.

The basics of range check elimination in loops

Consider a generic loop with an array bound check, which we'll call Loop 1:


for (int i = start; should_continue(i); i = next(i)) {
  if (element(i) >=u array.length) {
    deoptimize();
  }
  // some code that uses element(i)'s array element
}

Here, should_continue() tests a loop exit condition on i, next() computes the value of i for the next iteration, and element() computes the index of the array element to access.

Now, assume that, over the set of values that i takes as the loop executes, you can compute that element(i) is in the range [elementmin, elementmax]. If the range check succeeds for both bounds of the range, it's guaranteed to succeed for all i values in that loop. This observation allows the elimination of the bound check from the loop. That's illustrated in the following code, which we'll call Loop 2:


if (element_min >=u array.length) {
  deoptimize();
}

if (element_max >=u array.length) {
  deoptimize();
}

for (int i = start; should_continue(i); i = next(i)) {
  // some code that uses element(i)'s array element
}

If the range check fails for some value of i in Loop 1, one of the two checks before the loop body of Loop 2 will catch it. At that point, the thread would deoptimize, continue execution in the interpreter, and execute the loop body with actual bound checks for each iteration until it hits the value of i that causes the out-of-bound exception to be thrown. Note that the tests that precede the loop above only tell us that the bound check fails for some value of i, but not for which value.

Now, consider a loop body that looks like the following. We'll call it Loop 3:

for (int i = start; should_continue(i); i = next(i)) {
  if (element(i) >=u array.length) {
    throw new ArrayIndexOutOfBoundsException();
  }
  // some code that uses element(i)'s array element
}

If you applied the same transformation to Loop 3 that turned Loop 1 into Loop 2, you'd get the following—call it Loop 4:

if (element_min >=u array.length) {
  throw new ArrayIndexOutOfBoundsException();
}

if (element_max >=u array.length) {
  throw new ArrayIndexOutOfBoundsException();
}

for (int i = start; should_continue(i); i = next(i)) {
  // some code that uses element(i)'s array element
}

But this transformation isn't legal. If an out-of-bound access occurs for some i in Loop 3, the exception is thrown before Loop 4 starts executing and not at the iteration where the failing access should execute. What makes the transformation from Loop 1 to Loop 2 legal is the fact that range checks deoptimize rather than throw. In short, speculation is what allows this optimization.

Now consider the following code, which we'll call Loop 5:

for (int i = start; should_continue(i); i = next(i)) {
  if (some_condition) {
    // frequently executed but not always
    if (element(i) >=u array.length) {
      deoptimize();
    }
  }
  // some code that uses element(i)'s array element
}

Loop 5 could be transformed as follows into Loop 6:

if (element_min >=u array.length) {
  deoptimize();
}

if (element_max >=u array.length) {
  deoptimize();
}

for (int i = start; should_continue(i); i = next(i)) {
  if (some_condition) {
    // frequently executed but not always
  }
}

In this case, it's possible that one of the conditions before Loop 6 fails even though no range check fails in Loop 5. In other words, for the values of i for which the range check would fail in Loop 5, the range check is not executed (because in those situations some_condition is false). In that case, Loop 6 causes a deoptimization, and when execution resumes in the interpreter, no exception is thrown, so the false positive doesn't affect correctness.

The technique described here to hoist range checks out of loops is called loop predication in HotSpot.

Range check elimination in practice

Obviously, computing elementmin and elementmax in the most general case is out of reach for the compiler. HotSpot addresses this by restricting the shape of the loop functions should_continue() and next() and of the addressing function element() to shapes commonly used by developers. These shapes are simple enough that it's feasible to reason about them. In particular:

  • next(i) is i = i + stride, with stride a positive (the loop goes up) or negative (the loop goes down) constant.
  • should_continue(i) is i < stop or i > stop (with stop some integer value) depending on whether the loop goes up or down.
  • element(i) is scale * i + offset, with scale and offset required to be loop-invariant integer values.

A loop with the shape defined by points 1 and 2 above is called a counted loop. (Note that the following discussion assumes that scale and stride are positive numbers, but the reasoning described here can be generalized to all combinations of signs for scale and stride.)

The loop variable i takes values in the range [start, stop). The last value that i takes is in the range [stop - stride, stop). That is, if stride is 1, the last value of i is in [stop - 1, stop)—or, in other words, is stop - 1. If stride is 2, the last value of i is in [stop - 2, stop) and can be stop - 2 or stop - 1, depending on the exact value of start. Let's call that last value last.

With i in the range [start, last], it's then feasible to compute the minimal and the maximal values taken by element(i): element(i) is in [scale * start + offset, scale * last + offset].

last can be computed with:

start + ((stop - start - 1) / stride) * stride

The division operations here round down. For instance, if start is 1, stop is 10, and stride is 3, i takes the values 1, 4, and 7. last is 1 + ((10 - 1 - 1) / 3) * 3 = 7.

With elementmin and elementmax computed, it's now possible to hoist the range check using the technique described previously.

Another range check elimination technique

Again, hoisting of range checks out of loop as described so far is only possible when the range checks are speculated never to fail, and a failing range check causes deoptimization. However, HotSpot implements a second range check elimination pass that wouldn't in principle require speculation, although in practice, it's only implemented in HotSpot for speculative range checks, as speculation is so pervasive.

Given a counted loop and a range check, assume you can compute [imin, imax), which is a range of values of i for which you know the range check is guaranteed to not fail. How can this be leveraged to eliminate bound checks for most loop iterations? The trick, shown in the next listing, is to make copies of the loop and run each copy for a smaller number of iterations; the combination of the copies runs all the iterations.

Note that the main loop has no range check; there's no need for one, as it executes the range of values of i for which there can't be a range check failure. For most programs, most of the execution time should then be spent in that loop. If a range check fails, then it happens in the pre or post loops, which still have the required logic to throw an exception.

int i;

// pre loop
for (i = start; i < imin; i += stride)) {
  if (scale * i + offset >=u array.length) {
    deoptimize();
  }
  // some code that uses element(i)'s array element
}

// main loop
for (; i < imax; i += stride)) {
  // some code that uses element(i)'s array element
}

// post loop
for (; i < stop; i += stride)) {
  if (scale * i + offset >=u array.length) {
    deoptimize();
  }
  // some code that uses element(i)'s array element
}

Recall the final loop shape pattern outlined above: element(i) is scale * i + offset, with scale and offset required to be loop-invariant integer values, and with scale and stride assumed to be positive. A range check that has this pattern is guaranteed to not fail for:

  1. scale * i + offset >=0
  2. scale * i + offset < array.length

Which implies:

  1. i >= -offset / scale
  2. i < (array.length - offset) / scale

It's actually a bit trickier than that because integer rounding has to be taken into account. For instance, if offset is -1 and scale is 2, the formula in bullet 1 above gives i >= 0; but for i = 0, scale * i + offset = -1, which is an out-of-bound access. So instead:

  1. i >= -offset / scale + 1
  2. i < (array.length - offset) / scale

Integer rounding can lead to a slightly pessimistic but still correct inequality for the formula in bullet 2 here, as the integer division rounds down. For instance, if array.length is 10, offset is 1 and scale is 2, the formula in the second bullet results in i < 4; but for i = 4, scale * i + offset = 9, which is a valid access.

Using the inequalities above, all it takes is to set:

  • imin = min(stop, (-offset + 1) / scale)
  • imax = min(stop, (array.length - offset) / scale)

This can be generalized to multiple range checks (with different values of scale, offset, and array.length) and all combinations of signs for scale and stride.

Why counted loops are key to performance

Counted loops lend themselves well to optimizations other than range check elimination. Unrolling is a particularly important technique used in HotSpot. Unrolling involves making multiple copies of the loop body, one after another, to create a new loop body that executes several iterations of the initial loop at a time with a single exit test at the end of all the copies. For instance, a loop body can be unrolled 8 times. If the initial loop iterates 80 times, then after unrolling it only executes the unrolled loop body 10 times.

But what if the iteration count is 81? Then executing the unrolled loop body an extra time would cause the iteration count to reach 88! To solve this issue, unrolling also takes advantage of loop clones. The main loop body is unrolled to execute the 8 copies of the initial loop. The main loop's bounds are adjusted so that when only 1 iteration (in this example) is left, execution leaves the unrolled loop. Execution then continues in a copy that executes iterations one at a time.

Unrolling is important because the cost of the exit test is amortized over several iterations, and because it enables optimizations to apply to a larger body where redundancies are likely to be found and eliminated. One particularly important optimization that's enabled by unrolling in HotSpot is vectorization. A vector CPU instruction is an instruction that processes multiple elements at a time rather than operating on a single element of an array. For instance, with vector instructions, a CPU can load or store multiple array elements or perform arithmetic operations on multiple elements with a single instruction.

Code shapes that lend themselves well to vectorization usually experience a major performance improvement. HotSpot implements auto-vectorization by looking in an unrolled loop body for memory or arithmetic operations that apply to adjacent array elements. This rarely happens in most untransformed loops. However, unrolling by a factor of 4 or 8 can often produce code that includes operations on the exact number of adjacent elements that can be stored in a vector register.

Conclusion

This article has covered range check elimination in loops, a key optimization for Java because every array access is guarded by a bound check, and arrays are usually used heavily in loops. You learned how speculation enables such optimization and how targeting common loop shapes allows HotSpot to eliminate bound checks. It also covers counted loops, a particular category of loops commonly found in developers' code, and how they are particularly suitable for optimizations.

Related Posts

  • Runtime profiling in OpenJDK's HotSpot JVM

  • How the JIT compiler boosts Java performance in OpenJDK

Recent Posts

  • How to run AI models in cloud development environments

  • How Trilio secures OpenShift virtual machines and containers

  • How to implement observability with Node.js and Llama Stack

  • How to encrypt RHEL images for Azure confidential VMs

  • How to manage RHEL virtual machines with Podman Desktop

Red Hat Developers logo LinkedIn YouTube Twitter Facebook

Products

  • Red Hat Enterprise Linux
  • Red Hat OpenShift
  • Red Hat Ansible Automation Platform

Build

  • Developer Sandbox
  • Developer Tools
  • Interactive Tutorials
  • API Catalog

Quicklinks

  • Learning Resources
  • E-books
  • Cheat Sheets
  • Blog
  • Events
  • Newsletter

Communicate

  • About us
  • Contact sales
  • Find a partner
  • Report a website issue
  • Site Status Dashboard
  • Report a security problem

RED HAT DEVELOPER

Build here. Go anywhere.

We serve the builders. The problem solvers who create careers with code.

Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

Sign me up

Red Hat legal and privacy links

  • About Red Hat
  • Jobs
  • Events
  • Locations
  • Contact Red Hat
  • Red Hat Blog
  • Inclusion at Red Hat
  • Cool Stuff Store
  • Red Hat Summit

Red Hat legal and privacy links

  • Privacy statement
  • Terms of use
  • All policies and guidelines
  • Digital accessibility

Report a website issue