HotSpot Intrinsics
Sometimes, compilers have special treatments for some function implementations. Put simply, they replace the default implementation with another, possibly optimized, implementation. Such functions are known as intrinsic functions in compiler theory.
In this article, we’ll walk through a few examples to see how intrinsic functions are working in the HotSpot JVM.
A Tale of Two Logs
The Math.log()
method in Java computes the natural logarithm of any given number. Typical high school stuff, nothing fancy! Here’s what the implementation of this method looks like in OpenJDK:
@IntrinsicCandidate
public static double log(double a) {
return StrictMath.log(a); // default impl. delegates to StrictMath
}
As shown above, the Math.log()
method itself calls another method called StrictMath.log()
under the hood. Despite this delegation, we usually tend to use the Math.log()
instead of the strict and more direct one!
Premature-ization
Despite Donald Knuth’s efforts, one might propose to use the StrictMath
implementation, mainly to avoid the unnecessary indirection and being more sympathetic to the underlying mechanics!
Well, we all know that when the Math.log()
method gets hot enough (i.e. being called frequently enough), then the HotSpot JVM will inline this delegation. Therefore, it’s only natural to expect that both method calls exhibit similar performance characteristics, at least when the performance matters!.
To prove this hypothesis, let’s conduct a simple benchmark comparing the two implementations:
@State(Scope.Benchmark)
@BenchmarkMode(Mode.Throughput)
public class IntrinsicsBenchmark {
@Param("12346545756.54634")
double value;
@Benchmark
public double indirect() {
return Math.log(value); // Calls the StrictMath.log(value) under the hood.
}
@Benchmark
public double direct() {
return StrictMath.log(value);
}
// typical stuff
}
The result should be so predictable, right?
The Observer Effect
If we package the benchmark and run the following command:
>> java -jar intrinsics.jar -f 2 -t 8
After a while, JMH will print the benchmark result like the following:
Benchmark (value) Mode Cnt Score Error Units
IntrinsicsBenchmark.direct 12346545756.54634 thrpt 20 151571897.277 ± 7878104.343 ops/s
IntrinsicsBenchmark.indirect 12346545756.54634 thrpt 20 309745064.598 ± 12678366.349 ops/s
We didn’t see that coming, did we?! The indirect Math.log()
implementation outperforms the direct and supposedly more performant implementation by almost 105% in terms of throughput!
Pandora’s Box
Let’s take a closer look at the Math.log()
implementation once again, just to make sure we didn’t missed something there:
@IntrinsicCandidate
public static double log(double a) {
return StrictMath.log(a); // default impl. delegates to StrictMath
}
The delegation exists, for sure. Quite interestingly, there is also a @IntrinsicCandidate
annotation on the method. Before going any further, it’s worth mentioning that before Java 16, the same method did look like this:
@HotSpotIntrinsicCandidate
public static double log(double a) {
return StrictMath.log(a); // default impl. delegates to StrictMath
}
So basically, as of Java 16, the jdk.internal.HotSpotIntrinsicCandidate
is repackaged and renamed as jdk.internal.vm.annotation.IntrinsicCandidate
.
Anyway, the @IntrinsicCandidate
may reveal the actual reason behind this shocking benchmark result. Let’s take a peek at the annotation Javadoc:
/**
* The {@code @IntrinsicCandidate} annotation is specific to the
* HotSpot Virtual Machine. It indicates that an annotated method
* may be (but is not guaranteed to be) intrinsified by the HotSpot VM. A method
* is intrinsified if the HotSpot VM replaces the annotated method with hand-written
* assembly and/or hand-written compiler IR -- a compiler intrinsic -- to improve
* performance. The {@code @IntrinsicCandidate} annotation is internal to the
* Java libraries and is therefore not supposed to have any relevance for application
* code.
*
* @since 16
*/
@Target({ElementType.METHOD, ElementType.CONSTRUCTOR})
@Retention(RetentionPolicy.RUNTIME)
public @interface IntrinsicCandidate {
}
Well, based on this, the HotSpot JVM may replace the Math.log()
Java implementation with a possibly more efficient compiler intrinsic to improve the performance.
Down the Rabbit Hole
As it turns out, there actually is an intrinsic for the Math.log()
method!
The HotSpot JVM defines all its intrinsics in the vmIntrinsics.hpp
file1. In the HotSpot, there are two types of intrinsics:
- Library intrinsics: These are typical compiler intrinsics as they will replace the method implementations.
- Bytecode intrinsics: These methods won’t be replaced but instead would have special treatments.
The HotSpot JVM source code documents these two types as follows:
// There are two types of intrinsic methods: (1) Library intrinsics and (2) bytecode intrinsics.
//
// (1) A library intrinsic method may be replaced with hand-crafted assembly code,
// with hand-crafted compiler IR, or with a combination of the two. The semantics
// of the replacement code may differ from the semantics of the replaced code.
//
// (2) Bytecode intrinsic methods are not replaced by special code, but they are
// treated in some other special way by the compiler. For example, the compiler
// may delay inlining for some String-related intrinsic methods (e.g., some methods
// defined in the StringBuilder and StringBuffer classes, see
// Compile::should_delay_string_inlining() for more details).
Right after this, they list all the possible VM intrinsics one after another. For instance:
// Here are all the intrinsics known to the runtime and the CI.
// omitted
/* Math & StrictMath intrinsics are defined in terms of just a few signatures: */ \
do_class(java_lang_Math, "java/lang/Math")
/* here are the math names, all together: */ \
do_name(abs_name,"abs") do_name(sin_name,"sin") do_name(cos_name,"cos") \
do_name(tan_name,"tan") do_name(atan2_name,"atan2") do_name(sqrt_name,"sqrt") \
do_name(log_name,"log") do_name(log10_name,"log10") do_name(pow_name,"pow") \
do_name(exp_name,"exp") do_name(min_name,"min") do_name(max_name,"max") \
do_name(floor_name, "floor") do_name(ceil_name, "ceil") do_name(rint_name, "rint")
do_intrinsic(_dlog, java_lang_Math, log_name, double_double_signature, F_S)
As shown by the last line, there is actually an intrinsic replacement for the Math.log()
. For instance, on x86-64 architectures, the Math.log()
will be intrinsified as follows:
if (vmIntrinsics::is_intrinsic_available(vmIntrinsics::_dlog)) {
StubRoutines::_dlog = generate_libmLog();
}
// the generator
address generate_libmLog() {
StubCodeMark mark(this, "StubRoutines", "libmLog");
address start = __ pc();
const XMMRegister x0 = xmm0;
const XMMRegister x1 = xmm1;
const XMMRegister x2 = xmm2;
const XMMRegister x3 = xmm3;
const XMMRegister x4 = xmm4;
const XMMRegister x5 = xmm5;
const XMMRegister x6 = xmm6;
const XMMRegister x7 = xmm7;
const Register tmp1 = r11;
const Register tmp2 = r8;
BLOCK_COMMENT("Entry:");
__ enter(); // required for proper stackwalking of RuntimeStub frame
__ fast_log(x0, x1, x2, x3, x4, x5, x6, x7, rax, rcx, rdx, tmp1, tmp2);
__ leave(); // required for proper stackwalking of RuntimeStub frame
__ ret(0);
return start;
}
The vmIntrinsics.hpp
only defines the fact that some methods may have intrinsic implementations. The actual intrinsic routine is provided somewhere else and usually depends on the underlying architecture. In the above example, the src/hotspot/cpu/x86/stubGenerator_x86_64.cpp
is responsible for providing the actual intrinsic for the 64-bit x86 architecture.
In addition to being architecture-specific, intrinsics can be disabled. Therefore, the JVM compiler (C1 or C2) checks these two conditions before applying the intrinsic:
virtual bool is_intrinsic_available(const methodHandle& method, DirectiveSet* directive) {
return is_intrinsic_supported(method) &&
!directive->is_intrinsic_disabled(method) &&
!vmIntrinsics::is_disabled_by_flags(method);
}
Basically, an intrinsic is available if:
- The intrinsic is enabled, usually by using a tunable flag.
- The underlying platform supports the intrinsic.
Let’s see more about those tunables.
Tunables
Similar to many other aspects of the JVM, we can control the intrinsics to some extent using tunable flags.
For starters, the combination of -XX:+UnlockDiagnosticVMOptions
and -XX:+PrintIntrinsics
make the HotSpot to print all intrinsics while introducing them. For instance, if we run the same benchmark with these flags, we will see a lot of Math.log()
related logs:
>> java -XX:+UnlockDiagnosticVMOptions -XX:+PrintIntrinsics -jar intrinsics.jar -f 2 -t 8
// truncated logs
@ 4 java.lang.Math::log (5 bytes) (intrinsic)
@ 4 java.lang.Math::log (5 bytes) (intrinsic)
@ 55 java.lang.Math::min (11 bytes) (intrinsic)
@ 58 java.lang.System::arraycopy (0 bytes) (intrinsic)
Also, we can disable all the Math related intrinsics using the -XX:-InlineMathNatives
tunable:
>> java -XX:+UnlockDiagnosticVMOptions -XX:-InlineMathNatives -jar intrinsics.jar -f 1 -t 8
Benchmark (value) Mode Cnt Score Error Units
IntrinsicsBenchmark.direct 12346545756.54634 thrpt 20 171611762.349 ± 4203913.645 ops/s
IntrinsicsBenchmark.indirect 12346545756.54634 thrpt 20 169765587.934 ± 9555128.466 ops/s
As shown above, since the JVM no longer applies the intrinsics for the Math.log()
, the throughputs are almost the same!
Using a simple grep
, as always, we can see all the tunables related to a particular subject:
>> java -XX:+PrintFlagsFinal -XX:+UnlockDiagnosticVMOptions -version | grep Intrinsic
bool CheckIntrinsics = true
ccstrlist DisableIntrinsic =
bool PrintIntrinsics = false
bool UseAESCTRIntrinsics = true
bool UseAESIntrinsics = true
bool UseAdler32Intrinsics = false
bool UseBASE64Intrinsics = false
bool UseCRC32CIntrinsics = true
bool UseCRC32Intrinsics = true
bool UseCharacterCompareIntrinsics = false
bool UseGHASHIntrinsics = true
bool UseLibmIntrinsic = true
bool UseMathExactIntrinsics = true
bool UseMontgomeryMultiplyIntrinsic = true
bool UseMontgomerySquareIntrinsic = true
bool UseMulAddIntrinsic = true
bool UseMultiplyToLenIntrinsic = true
bool UseSHA1Intrinsics = false
bool UseSHA256Intrinsics = true
bool UseSHA512Intrinsics = true
bool UseSSE42Intrinsics = true
bool UseSquareToLenIntrinsic = true
bool UseVectorizedMismatchIntrinsic = true
And, one more thing:
>> java -XX:+PrintFlagsFinal -XX:+UnlockDiagnosticVMOptions -version | grep Native
bool CriticalJNINatives = true
bool InlineClassNatives = true
bool InlineMathNatives = true
bool InlineNatives = true
bool InlineThreadNatives = true
Closing Remarks
In this article, we saw how the JVM may replace some critical Java methods with more efficient implementations at runtime.
Of course, the JVM compiler is a complex piece of software. Therefore, covering all the details related to intrinsics is both beyond the scope of this article and certainly beyond the writer’s knowledge. However, I hope this serves as a good starting point for the curious!
As always, the source code is available on GitHub!
footnotes
1. Over the years, the file responsible for declaring the VM intrinsics has changed. For instance, before the vmIntrinsics.hpp
, the vmSymbols.hpp
was the home for all intrinsics.
2. The cover image is from lls-ceilap on Quantum Observer Effect.