Optimize a Native Executable with Profile-Guided Optimizations

GraalVM Native Image offers quick startup and less memory consumption for a Java application, running as a native executable, by default. You can optimize this native executable even more for additional performance gain and higher throughput by applying Profile-Guided Optimizations (PGO).

With PGO you can collect the profiling data in advance and then feed it to the native-image tool, which will use this information to optimize the performance of the resulting binary.

Note: PGO is not available in GraalVM Community Edition.

This guide shows how to apply PGO and transform your Java application into an optimized native executable.

Run a Demo

For the demo part, you will run a Java application performing queries implemented with the Java Streams API. A user is expected to provide two integer arguments: the number of iterations and the length of the data array. The application creates the data set with a deterministic random seed and iterates 10 times. The time taken for each iteration and its checksum is printed to the console.

Below is the stream expression to optimize:

Arrays.stream(persons)
   .filter(p -> p.getEmployment() == Employment.EMPLOYED)
   .filter(p -> p.getSalary() > 100_000)
   .mapToInt(Person::getAge)
   .filter(age -> age > 40)
   .average()
   .getAsDouble();

Follow these steps to build an optimized native executable using PGO.

Note: Make sure you have installed a GraalVM JDK. The easiest way to get started is with SDKMAN!. For other installation options, visit the Downloads section.

  1. Save the following code to the file named Streams.java:

    import java.util.Arrays;
    import java.util.Random;
    
    public class Streams {
    
      static final double EMPLOYMENT_RATIO = 0.5;
      static final int MAX_AGE = 100;
      static final int MAX_SALARY = 200_000;
    
      public static void main(String[] args) {
    
        int iterations;
        int dataLength;
        try {
          iterations = Integer.valueOf(args[0]);
          dataLength = Integer.valueOf(args[1]);
        } catch (Throwable ex) {
          System.out.println("Expected 2 integer arguments: number of iterations, length of data array");
          return;
        }
    
        Random random = new Random(42);
        Person[] persons = new Person[dataLength];
        for (int i = 0; i < dataLength; i++) {
          persons[i] = new Person(
              random.nextDouble() >= EMPLOYMENT_RATIO ? Employment.EMPLOYED : Employment.UNEMPLOYED,
              random.nextInt(MAX_SALARY),
              random.nextInt(MAX_AGE));
        }
    
        long totalTime = 0;
        for (int i = 1; i <= 20; i++) {
          long startTime = System.currentTimeMillis();
    
          long checksum = benchmark(iterations, persons);
    
          long iterationTime = System.currentTimeMillis() - startTime;
          totalTime += iterationTime;
          System.out.println("Iteration " + i + " finished in " + iterationTime + " milliseconds with checksum " + Long.toHexString(checksum));
        }
        System.out.println("TOTAL time: " + totalTime);
      }
    
      static long benchmark(int iterations, Person[] persons) {
        long checksum = 1;
        for (int i = 0; i < iterations; ++i) {
          double result = getValue(persons);
    
          checksum = checksum * 31 + (long) result;
        }
        return checksum;
      }
    
      public static double getValue(Person[] persons) {
        return Arrays.stream(persons)
            .filter(p -> p.getEmployment() == Employment.EMPLOYED)
            .filter(p -> p.getSalary() > 100_000)
            .mapToInt(Person::getAge)
            .filter(age -> age >= 40).average()
            .getAsDouble();
      }
    }
    
    enum Employment {
      EMPLOYED, UNEMPLOYED
    }
    
    class Person {
      private final Employment employment;
      private final int age;
      private final int salary;
    
      public Person(Employment employment, int height, int age) {
        this.employment = employment;
        this.salary = height;
        this.age = age;
      }
    
      public int getSalary() {
        return salary;
      }
    
      public int getAge() {
        return age;
      }
    
      public Employment getEmployment() {
        return employment;
      }
    }
    
  2. Compile the application:
    $JAVA_HOME/bin/javac Streams.java
    

    (Optional) Run the demo application, providing some arguments to observe performance.

    $JAVA_HOME/bin/java Streams 100000 200
    
  3. Build a native executable from the class file, and run it to compare the performance:
     $JAVA_HOME/bin/native-image Streams
    

    An executable file, streams, is created in the current working directory. Now run it with the same arguments to see the performance:

     ./streams 100000 200
    

    This version of the program is expected to run slower than on GraalVM’s or any regular JDK.

  4. Build an instrumented native executable by passing the --pgo-instrument option to native-image:

     $JAVA_HOME/bin/native-image --pgo-instrument Streams
    
  5. Run it to collect the code-execution-frequency profiles:

     ./streams 100000 20
    

    Notice that you can profile with a much smaller data size. Profiles collected from this run are stored by default in the default.iprof file.

    Note: You can specify where to collect the profiles when running an instrumented native executable by passing the -XX:ProfilesDumpFile=YourFileName option at run time.

  6. Finally, build an optimized native executable by specifying the path to the collected profiles:

     $JAVA_HOME/bin/native-image --pgo=default.iprof Streams
    

    Note: You can also collect multiple profile files, by specifying different filenames, and pass them to the native-image tool at build time.

    Run this optimized native executable timing the execution to see the system resources and CPU usage:

     time ./streams 100000 200
    

    You should get the performance comparable to, or faster, than the Java version of the program. For example, on a machine with 16 GB of memory and 8 cores, the TOTAL time for 10 iterations reduced from ~2200 to ~270 milliseconds.

This guide showed how you can optimize native executables for additional performance gain and higher throughput. Oracle GraalVM offers extra benefits for building native executables, such as Profile-Guided Optimizations (PGO). With PGO you “train” your application for specific workloads and significantly improve the performance.