In the previous article we introduced Records, a new preview feature in Java 14. Records are providing a nice compact syntax to declare classes that are supposed to be dumb data holders. In this article, we’re going to see how Records are looking like under the hood. So buckle up!
Table of Contents
- Class Representation
- The Curious Case of Data Classes
- Kotlin’s Data Classes
- Scala’s Case Classes
- Invoke Dynamic
- Introducing Indy
- User-Definable Bytecode
- How Does Indy Work?
- Why Indy?
- The Object Methods
- Reflecting on Records
- Annotating Records
Let’s start with a very simple example:
How about compiling this code using
Then, It’s possible to take a peek at the generated bytecode using
This will print the following:
Interestingly, similar to Enums, Records are normal Java classes with a few fundamental properties:
- They are declared as
finalclasses, so we can’t inherit from them.
- They’re already inheriting from another class named
java.lang.Record. Therefore, Records can’t extend any other class, as Java does not allow multiple-inheritance.
- Records can implement other interfaces.
- For each component, there is an accessor method, e.g.
- There are auto-generated implementations for
hashCodebased on all components.
- Finally, there is an auto-generated constructor that accepts all components as its arguments.
java.lang.Record is just an abstract class with a protected no-arg constructor and a few other basic abstract methods:
Nothing special is about this class!
The Curious Case of Data Classes
Coming from a Kotlin or Scala background, one may spot some similarities between Records in Java, Data Classes in Kotlin and Case Classes in Scala. On the surface, they all share one very fundamental goal: To facilitate writing data holders.
Despite this fundamental similarity, things are very different at the bytecode level.
Kotlin’s Data Class
For the sake of comparison, let’s see a Kotlin data class equivalent of
Similar to Records, Kotlin compiler generates accessor methods, default
hashCode implementations and a few more functions based on this simple one-liner.
Let’s see how the Kotlin compiler generates the code for, say,
We issued the
javap -c -v Range to generate this output. Also, here we’re using the simple class names for the sake of brevity.
Anyway, Kotlin is using the
StringBuilder to generate the string representation instead of multiple string concatenations (Like any decent Java developer!). That is:
- At first, it creates a new instance of
StringBuilder(index 0, 3, 4).
- Then it appends the literal
Range(min=string (index 7, 9).
- Then it appends the actual min value (index 12, 13, 16).
- Then it appends the literal
, max=(index 19, 21).
- Then it appends the actual max value (index 24, 25, 28).
- Then it closes the parentheses by appending the
)literal (index 31, 33).
- Finally, it builds the
StringBuilderinstance and returns it (index 36, 39).
Basically, the more we have properties in our data class, the lengthier the bytecode and consequently longer startup time.
Scala’s Case Class
Let’s write the
case class equivalent in Scala:
At first glance, Scala seems to generate a much simpler
toString calls the
scala.runtime.ScalaRunTime._toString static method. That, in turn, calls the
productIterator method to iterate through this Product Type. This iterator calls the
productElement method which looks like:
This basically switches over all properties of the
case class. For instance, if the
productIterator wants the first property, it returns the
min. Also, when the
productIterator wants the second element, it will return the
max value. Otherwise, it will throw an instance of
IndexOutOfBoundsException to signal an out of bound request.
Again, the more we have properties in a
case class, we would have more of those switch arms. Therefore, the bytecode length is proportional to the number of properties. Hence, the same problem as Kotlin’s
Let’s take an even closer look to the bytecode generated for the Java Records:
Regardless of the number of record components, this will be the bytecode. A simple, polished and elegant solution. But how this
invokedynamic thing works?
Invoke Dynamic (Also known as Indy) was part of JSR 292 intending to enhance the JVM support for Dynamic Type Languages. After its first release in Java 7, The
invokedynamic opcode along with its
java.lang.invoke luggage is used quite extensively by dynamic JVM-based languages like JRuby.
Although indy specifically designed to enhance the dynamic language support, it offers much more than that. As a matter of fact, it’s suitable to use wherever a language designer needs any form of dynamicity, from dynamic type acrobatics to dynamic strategies! For instance, the Java 8 Lambda Expressions are actually implemented using
invokedynamic, even though Java is a statically typed language!
For quite some time JVM did support four method invocation types:
invokestatic to call static methods,
invokeinterface to call interface methods,
invokespecial to call constructors,
private methods and
invokevirtual to call instance methods.
Despite their differences, these invocation types share one common trait: we can’t enrich them with our own logic. On the contrary,
invokedynamic enables us to Bootstrap the invocation process in any way we want. Then the JVM takes care of calling the Bootstrapped Method directly.
How Does Indy Work?
The first time JVM sees an
invokedynamic instruction, it calls a special static method called Bootstrap Method. The bootstrap method is a piece of Java code that we’ve written to prepare the actual to-be-invoked logic:
Then the bootstrap method returns an instance of
CallSite holds a reference to the actual method, i.e.
MethodHandle. From now on, every time JVM sees this
invokedynamic instruction again, it skips the Slow Path and directly calls the underlying executable. The JVM continues to skip the slow path unless something changes.
As opposed to the Reflection APIs, the
java.lang.invoke API is quite efficient since the JVM can completely see through all invocations. Therefore, JVM may apply all sorts of optimizations as long as we avoid the slow path as much as possible!
In addition to the efficiency argument, the
invokedynamic approach is more reliable and less brittle because of its simplicity.
Moreover, the generated bytecode for Java Records is independent of the number of properties. So, less bytecode and faster startup time.
Finally, let’s suppose a new version of Java includes a new and more efficient bootstrap method implementation. With
invokedynamic, our app can take advantage of this improvement without recompilation. This way we have some sort of Forward Binary Compatibility. Also, That’s the dynamic strategy we were talking!
The Object Methods
Now that we are familiar enough with Indy, let’s make sense of the
invokedynamic in Records bytecode:
Look what I found in the Bootstrap Method Table:
So the bootstrap method for Records is called
bootstrap which resides in the
java.lang.runtime.ObjectMethods class. As you can see, this bootstrap method expects the following parameters:
- An instance of
MethodHandles.Lookuprepresenting the lookup context (The
- The method name (i.e.
hashCode, etc.) the bootstrap is going to link. For example, when the value is
toString, bootstrap will return a
CallSitethat never changes) that points to the actual
toStringimplementation for this particular Record.
TypeDescriptorfor the method (
- A type token, i.e.
Class<?>, representing the Record class type. It’s
Class<Range>in this case.
- A semi-colon separated list of all component names, i.e.
MethodHandleper component. This way the bootstrap method can create a
MethodHandlebased on the components for this particular method implementation.
invokedynamic instruction passes all those arguments to the bootstrap method. Bootstrap method, in turn, returns an instance of
ConstantCallSite is holding a reference to requested method implementation, e.g.
Reflecting on Records
java.lang.Class API has been retrofitted to support Records. For example, given a
Class<?>, we can check whether it’s a Record or not using the new
It obviously returns
false for non-record types:
There is, also, a
getRecordComponents method which returns an array of
RecordComponent in the same order they defined in the original record. Each
java.lang.reflect.RecordComponent is representing a record component or variable of the current record type. For example, the
RecordComponent.getName returns the component name:
In the same way the
getType method returns the type token for each component:
It’s even possible to get a handle to accessor methods via
Java permits to annotate Records, As long as the annotation is applicable to a record or its members. Additionally, there would be a new annotation
RECORD_COMPONENT. Annotations with this target can only be used on record components:
Any new Java feature without a nasty relationship with Serialization would be incomplete. This time around, however, the relationship does not sound as disgusting as we’re used to.
Although Records are not by default serializable, it’s possible to make them so just by implementing the
java.io.Serializable marker interface.
Serializable records are serialized and deserialized differently than ordinary serializable objects. The updated javadoc for
ObjectInputStream states that:
- The serialized form of a record object is a sequence of values derived from the record components.
- The process by which record objects are serialized or externalized cannot be customized; any class-specific
readExternalmethods defined by record classes are ignored during serialization and deserialization.
serialVersionUIDof a record class is
0Lunless explicitly declared.
Java Records are going to provide a new way to encapsulate data holders. Even though currently, they’re limited in terms of functionality (Compared to what Kotlin or Scala are offering), the implementation is solid.
The first preview of Records would be available in March 2020. In this article, we’ve used the
openjdk 14-ea 2020-03-17 build, since the Java 14 is yet to be released!