[−][src]Module lockfreehashmap::map_inner

This module implements most of the logic behind the [::LockFreeHashMap].

The talk that Dr. Click gave is available here. However, the information below should ideally be enough to understand all the necessary code.

The Rust standard library has an implementation of a HashMap. However, to use it concurrently (and safely), one must put a lock on it. This is very inefficient and can lead to deadlocks. In order to make something lock-free, at least one thread has to make progress after some time. One of the main advantages of using lock-free structures is to avoid deadlocks and livelocks. Concurrent algorithms typically make use of the atomic operation "compare and swap" (CAS), which atomically swaps one value for another only if the current value is equal to an expected value. (They can also equivalently use Load Linked/Store Conditional) Considering that a hash map stores a key/value pair (i.e. a key with some associated value), it is important that a key is always associated with a value it's supposed to be associated to. This means avoiding an inconsistent state where you have a key/value pair (K, V), where V could never be associated with K. Lock-free hash maps have use cases in databases, web caching and large-scale business programs.

Guarantees

This map has the same "guarantees" as simply having some number of global variables that can be updated atomically.
O(n) time complexity, like any hash map.

Valid States

Key for State Diagram

Value	Meaning
∅/null	Empty; No value here.
K	A key is here.
X	This key slot is taken because there is a newer table available.
V	A value is here.
V'	A value is here but the table is currently being resized.
T	Value was removed.

State Diagram

  (∅, ∅) ---------> (X, ∅)
     |
     |
    \|/
  (K, ∅) ---------> (K, V) <=======> (K, T)
     |                 |                |
     |                 |                |
     |                \|/              \|/
     |              (K, V') -------> (K, X)
     |                                 /|\
     |                                  |
     └----------------------------------┘

From this diagram, you can see that once a key slot is taken, it will always point to that key and only that key. Therefore, if you have two threads (thread 1, thread 2) inserting (K1, V1) and (K2, V2) respectively, where hash(K1) == hash(K2) == 5 (for example), and K1 != K2, and the array containing the key/value pairs is (null, null) at index 5. Then one of the threads, e.g. thread 1, will perform the transition (null, null) -> (K1, null) at array index 5, allowing it afterwards to insert its value V1, The other thread needs to continue probing (at index 6, 7, ...) to find a different key slot. If another thread (thread 3) already inserted a key K2 at some index after 5, then obviously thread 2 does not need to insert its key into the map at all. Thus, there are no inconsistent states where you have a value that is paired with a key that it's not associated to.

Resizing

To resize the map, a new bigger array needs to be allocated and all the key/value pairs have to be moved from their current slots in the current array into slots in the new array. Because other threads can call insert() and remove() while the key/value pairs are moved, there needs to be some way of determining what order this happened in and how to copy the slot into the newer table. This is done by having any calls that try to access the current slot try and help complete the copy if they find a V' value. So if a thread calls e.g. get() while this is happening, it needs to copy the current slot into the new map before returning. (As an implementation detail, it also helps to copy other slots while it's at it.)

The transitions necessary in both the old and new map are shown below:

Transition# |        [1]        |        [2]       |        [3]       |         [4]
Old Map     | (K, V) -> (K, V') |                  |                  | (K, V') -> (K, X)
New Map     |                   | (∅, ∅) -> (K, ∅) | (K, ∅) -> (K, V) |

Transition [1] marks the value slot as being copied. If another thread tries to access it while it's V', then it must help complete copying this key/value pair into the newer map. Note that if V is actually null or T, we can simply set it to X and skip inserting a tombstone/null value into the newer map, which just wastes a key slot.
Transition [2] and [3] copy the old value into the new map. They are separate states to remind you that separate threads can perform either. If inserting it fails, then we know that some thread didn't care about the current value and just insert()ed a new value anyway. This is fine and just means that V needs to be deallocated. This is because if the slot wasn't being copied, it would have overridden the current V with another, which would have ended up being copied into the newer table afterwards.
Finally, transition [4] completes the copy by marking the valueslot as X, meaning there could be something here (or not) but you need to see the newer table to find out.

If multiple threads are trying to replace() data at index i in the map, they have to either do it before transition [1] happens, or after transition [4] happens. The main goal of this data structure is to be completely lock-free/non-blocking. Therefore, instead of looping and waiting until all transitions have finished, which is essentially a blocking algorithm, each thread can help make progress by doing any (or all) of the 4 transitions above.

Structs

MapInner

A map containing a unique, non-resizable array to the Key/Value pairs. If the map needs to be resized, a new MapInner must be created and its Key/Value pairs must be copied from this one. Logically, this struct owns its keys and values, and so is responsible for freeing them when dropped.

Enums

KeyCompare	Sometimes when calling `put_if_match()` we want to insert a key and sometimes we just want to compare it with some variable of type `Q`. This enum represents which one is intended.
KeySlot	The hash map is implemented as an array of key-value pairs, where each key and value can be one of several states. This enum represents the various states that a key can be in, excluding the null/empty state.
Match	Sometimes, when inserting a new value into the hash map, we only want to insert something if the value already matches something.
PutValue	This enum represents the value to insert when calling `put_if_match()`, which is usually owned when called from `LockFreeHashMap` and shared if its been copied from a previous, smaller map.
QRef	See `KeyCompare::as_qref()` for the motivation behind this type.
QRef2	See `KeyCompare::as_qref()` for the motivation behind this type.
ValueSlot	The hash map is implemented as an array of key-value pairs, where each key and value can be one of several states. This enum represents the various states that a value can be in, excluding the null/empty state.

Type Definitions

KVPair