[−][src]Module lockfreehashmap::map_inner
This module implements most of the logic behind the [::LockFreeHashMap].
The talk that Dr. Click gave is available here. However, the information below should ideally be enough to understand all the necessary code.
The Rust standard library has an implementation of a HashMap.
However, to use it concurrently (and safely), one must put a lock on it.
This is very inefficient and can lead to deadlocks.
In order to make something lock-free, at least one thread has to make progress after some time.
One of the main advantages of using lock-free structures is to avoid deadlocks and livelocks.
Concurrent algorithms typically make use of the atomic operation "compare and swap" (CAS),
which atomically swaps one value for another
only if the current value is equal to an expected value.
(They can also equivalently use Load Linked/Store Conditional)
Considering that a hash map stores a key/value pair (i.e. a key with some associated value),
it is important that a key is always associated with a value it's supposed to be associated
to.
This means avoiding an inconsistent state where you have a key/value pair (K, V)
,
where V
could never be associated with K
.
Lock-free hash maps have use cases in databases, web caching and large-scale business programs.
Guarantees
- This map has the same "guarantees" as simply having some number of global variables that can be updated atomically.
O(n)
time complexity, like any hash map.
Valid States
Key for State Diagram
Value | Meaning |
---|---|
∅/null | Empty; No value here. |
K | A key is here. |
X | This key slot is taken because there is a newer table available. |
V | A value is here. |
V' | A value is here but the table is currently being resized. |
T | Value was removed. |
State Diagram
(∅, ∅) ---------> (X, ∅)
|
|
\|/
(K, ∅) ---------> (K, V) <=======> (K, T)
| | |
| | |
| \|/ \|/
| (K, V') -------> (K, X)
| /|\
| |
└----------------------------------┘
From this diagram,
you can see that once a key slot is taken,
it will always point to that key and only that key.
Therefore,
if you have two threads (thread 1, thread 2)
inserting (K1, V1)
and (K2, V2)
respectively,
where hash(K1) == hash(K2) == 5 (for example)
,
and K1 != K2
,
and the array containing the key/value pairs is (null, null)
at index 5.
Then one of the threads, e.g. thread 1,
will perform the transition (null, null) -> (K1, null)
at array index 5,
allowing it afterwards to insert its value V1
,
The other thread needs to continue probing (at index 6, 7, ...) to find a different key slot.
If another thread (thread 3) already inserted a key K2
at some index after 5,
then obviously thread 2 does not need to insert its key into the map at all.
Thus, there are no inconsistent states where you have a value that is paired with a key
that it's not associated to.
Resizing
To resize the map,
a new bigger array needs to be allocated
and all the key/value pairs
have to be moved from their current slots in the current array
into slots in the new array.
Because other threads can call insert()
and remove()
while the key/value pairs are moved,
there needs to be some way of determining what order this happened in
and how to copy the slot into the newer table.
This is done by having any calls that try to access the current slot try and help complete
the copy if they find a V'
value.
So if a thread calls e.g. get()
while this is happening,
it needs to copy the current slot into the new map before returning.
(As an implementation detail, it also helps to copy other slots while it's at it.)
The transitions necessary in both the old and new map are shown below:
Transition# | [1] | [2] | [3] | [4]
Old Map | (K, V) -> (K, V') | | | (K, V') -> (K, X)
New Map | | (∅, ∅) -> (K, ∅) | (K, ∅) -> (K, V) |
- Transition [1] marks the value slot as being copied.
If another thread tries to access it while it's
V'
, then it must help complete copying this key/value pair into the newer map. Note that if V is actually null orT
, we can simply set it toX
and skip inserting a tombstone/null value into the newer map, which just wastes a key slot. - Transition [2] and [3] copy the old value into the new map.
They are separate states to remind you that separate threads can perform either.
If inserting it fails,
then we know that some thread didn't care about the current value
and just
insert()
ed a new value anyway. This is fine and just means thatV
needs to be deallocated. This is because if the slot wasn't being copied, it would have overridden the currentV
with another, which would have ended up being copied into the newer table afterwards. - Finally, transition [4] completes the copy by marking the valueslot as
X
, meaning there could be something here (or not) but you need to see the newer table to find out.
If multiple threads are trying to replace()
data at index i
in the map,
they have to either do it before transition [1] happens,
or after transition [4] happens.
The main goal of this data structure is to be completely lock-free/non-blocking.
Therefore, instead of looping and waiting until all transitions have finished,
which is essentially a blocking algorithm,
each thread can help make progress by doing any (or all) of the 4 transitions above.
Structs
MapInner |
A map containing a unique, non-resizable array to the Key/Value pairs. If the map needs to be
resized, a new |
Enums
KeyCompare |
Sometimes when calling |
KeySlot |
The hash map is implemented as an array of key-value pairs, where each key and value can be one of several states. This enum represents the various states that a key can be in, excluding the null/empty state. |
Match |
Sometimes, when inserting a new value into the hash map, we only want to insert something if the value already matches something. |
PutValue |
This enum represents the value to insert when calling |
QRef |
See |
QRef2 |
See |
ValueSlot |
The hash map is implemented as an array of key-value pairs, where each key and value can be one of several states. This enum represents the various states that a value can be in, excluding the null/empty state. |
Type Definitions
KVPair |