[][src]Module lockfreehashmap::map_inner

This module implements most of the logic behind the [::LockFreeHashMap].

The talk that Dr. Click gave is available here. However, the information below should ideally be enough to understand all the necessary code.

The Rust standard library has an implementation of a HashMap. However, to use it concurrently (and safely), one must put a lock on it. This is very inefficient and can lead to deadlocks. In order to make something lock-free, at least one thread has to make progress after some time. One of the main advantages of using lock-free structures is to avoid deadlocks and livelocks. Concurrent algorithms typically make use of the atomic operation "compare and swap" (CAS), which atomically swaps one value for another only if the current value is equal to an expected value. (They can also equivalently use Load Linked/Store Conditional) Considering that a hash map stores a key/value pair (i.e. a key with some associated value), it is important that a key is always associated with a value it's supposed to be associated to. This means avoiding an inconsistent state where you have a key/value pair (K, V), where V could never be associated with K. Lock-free hash maps have use cases in databases, web caching and large-scale business programs.

Guarantees

Valid States

Key for State Diagram

Value Meaning
∅/null Empty; No value here.
K A key is here.
X This key slot is taken because there is a newer table available.
V A value is here.
V' A value is here but the table is currently being resized.
T Value was removed.

State Diagram

  (∅, ∅) ---------> (X, ∅)
     |
     |
    \|/
  (K, ∅) ---------> (K, V) <=======> (K, T)
     |                 |                |
     |                 |                |
     |                \|/              \|/
     |              (K, V') -------> (K, X)
     |                                 /|\
     |                                  |
     └----------------------------------┘

From this diagram, you can see that once a key slot is taken, it will always point to that key and only that key. Therefore, if you have two threads (thread 1, thread 2) inserting (K1, V1) and (K2, V2) respectively, where hash(K1) == hash(K2) == 5 (for example), and K1 != K2, and the array containing the key/value pairs is (null, null) at index 5. Then one of the threads, e.g. thread 1, will perform the transition (null, null) -> (K1, null) at array index 5, allowing it afterwards to insert its value V1, The other thread needs to continue probing (at index 6, 7, ...) to find a different key slot. If another thread (thread 3) already inserted a key K2 at some index after 5, then obviously thread 2 does not need to insert its key into the map at all. Thus, there are no inconsistent states where you have a value that is paired with a key that it's not associated to.

Resizing

To resize the map, a new bigger array needs to be allocated and all the key/value pairs have to be moved from their current slots in the current array into slots in the new array. Because other threads can call insert() and remove() while the key/value pairs are moved, there needs to be some way of determining what order this happened in and how to copy the slot into the newer table. This is done by having any calls that try to access the current slot try and help complete the copy if they find a V' value. So if a thread calls e.g. get() while this is happening, it needs to copy the current slot into the new map before returning. (As an implementation detail, it also helps to copy other slots while it's at it.)

The transitions necessary in both the old and new map are shown below:

Transition# |        [1]        |        [2]       |        [3]       |         [4]
Old Map     | (K, V) -> (K, V') |                  |                  | (K, V') -> (K, X)
New Map     |                   | (∅, ∅) -> (K, ∅) | (K, ∅) -> (K, V) |

If multiple threads are trying to replace() data at index i in the map, they have to either do it before transition [1] happens, or after transition [4] happens. The main goal of this data structure is to be completely lock-free/non-blocking. Therefore, instead of looping and waiting until all transitions have finished, which is essentially a blocking algorithm, each thread can help make progress by doing any (or all) of the 4 transitions above.

Structs

MapInner

A map containing a unique, non-resizable array to the Key/Value pairs. If the map needs to be resized, a new MapInner must be created and its Key/Value pairs must be copied from this one. Logically, this struct owns its keys and values, and so is responsible for freeing them when dropped.

Enums

KeyCompare

Sometimes when calling put_if_match() we want to insert a key and sometimes we just want to compare it with some variable of type Q. This enum represents which one is intended.

KeySlot

The hash map is implemented as an array of key-value pairs, where each key and value can be one of several states. This enum represents the various states that a key can be in, excluding the null/empty state.

Match

Sometimes, when inserting a new value into the hash map, we only want to insert something if the value already matches something.

PutValue

This enum represents the value to insert when calling put_if_match(), which is usually owned when called from LockFreeHashMap and shared if its been copied from a previous, smaller map.

QRef

See KeyCompare::as_qref() for the motivation behind this type.

QRef2

See KeyCompare::as_qref() for the motivation behind this type.

ValueSlot

The hash map is implemented as an array of key-value pairs, where each key and value can be one of several states. This enum represents the various states that a value can be in, excluding the null/empty state.

Type Definitions

KVPair