Lock Striping
There are dozens of decent lock-free hashtable implementations. Usually, those data structures, instead of using plain locks, are using CAS based operations to be Lock Free. With this narrative, it might sound I’m gonna use this post to build an argument for lock-free data structures. That’s not the case, quite surprisingly. On the contrary, here we’re going to talk about Plain Old Locks.
Now that everybody’s on-board, Let’s implement a concurrent version of Map
with this simple contract:
Since this is going to be a thread-safe data structure, we should acquire some locks and subsequently release them while putting, getting or removing elements:
Also, we should initialize the implementation with a pre-defined number of buckets:
One Size Fits All?
What happens when the number of stored elements are way more than the number of buckets? Even in the presence of the most versatile hash-functions, our hashtable’s performance would degrade significantly. In order to prevent this, we need to add more buckets when needed:
With the help of these abstract template methods, here’s how we can put a new key-value pair into our map:
In order to find a bucket for the new element, we are relying on the hashcode
of the key. Also, we should calculate the hashcode
modulo the number of buckets to prevent being out of bounds!
The remaining parts are pretty straightforward: We’re acquiring a lock for the new entry and only add the entry if it does not exist already. Otherwise, we would update the current entry. At the end of the operation, if we came to this conclusion that we should add more
buckets, then we would do so to maintain the hashtable’s balance. The get
and remove
methods can also be implemented as easily.
Let’s fill the blanks by implementing all those abstract methods!
One Lock to Rule Them All
The simplest way to implement a thread-safe version of this data structure is to use just one lock for all its operations:
The current acquire and release implementations are completely independent of the map entry. As promised, we’re using just one ReentrantLock
for synchronization. This approach is known as Coarse-Grained Synchronization since we’re using just one lock to enforce exclusive access across the whole hashtable.
Also, we should resize the hashtable to maintain its constant access and modification time. In order to do that, we can incorporate different heuristics. For example, when the average bucket size exceeds a specific number:
In order to add more buckets, we should copy the current hashtable and create a new table with, say, twice the size. Then add all entries from the old table to the new one:
Please note that we should acquire and release the same lock for resize operation, too.
Sequential Bottleneck
Suppose there are three concurrent requests for putting something into the first bucket, getting something from the third bucket and removing something from the sixth bucket:
Ideally, we expect from a highly scalable concurrent data structure to serve such requests with as little coordination as possible. However, this is what happens in the real world:
As we’re using only one lock for synchronization, concurrent requests will be blocked behind the first one that acquired the lock. When the first one releases the lock, the second one acquires it and after some time, releases it.
This phenomenon is known as Sequential Bottleneck (Something like Head of Line Blocking) and we should mitigate this effect in concurrent environments.
The United Stripes of Locks
One way to improve our current coarse-grained implementation is to use a collection of locks instead of just one lock. This way each lock would be responsible for synchronizing one or more buckets. This is more of a Finer-Grained Synchronization compared to our current approach:
Here we’re dividing the hashtable into a few parts and synchronize each part independently. This way we reduce the contention for each lock and consequently improving the throughput. This idea also is called Lock Striping because we synchronize different parts (i.e. stripes) independently:
We need a method to find an appropriate lock for each entry (i.e. bucket):
Then we can acquire and release a lock for an operation on a particular entry as following:
The resize procedure is almost the same as before. The only difference is that we should acquire all locks before the resize and release all of them after the resize.
We expect the fine-grained approach to outperform the coarse-grained one, let’s measure it!
Moment of Truth
In order to benchmark these two approaches, we’re going to use the following workload on both implementations:
Each time we will put a random key, get a random key and remove a random key. Each workload would run these three operations 100 times. Adding the JMH to the mix, each workload would be executed thousands of times:
The coarse-grained implementation can handle 8547 operations per second on average:
On the other hand, the fine-grained implementation can handle up to 13855 operations per second on average.
Recursive Split-Ordered Lists
In their paper, Ori Shalev and Nir Shavit proposed a way to implement a lock-free hashtable. They used a way of ordering elements in a linked list so that they can be repeatedly “split” using a single compare-and-swap operation. Anyway, they’ve used a pretty different approach to implement this concurrent data structure. It’s highly recommended to check out and read that paper.
Also, the source codes of this article are available on Github.