Continuing on from the previous post about shared state – let us try and get some concept of expected and actual state. This is looking in from the outside.
Few assumptions first:
- State can be written or read.
- In terms of multiple processes writing to a shared state store – we only care about state read and state written and assume that half updates are not allowed at the lowest level – to simplify we are not looking at object updates where say out of three items two are updated to the new value but the third one is not, at the level of the object we have a partial update but at the level of the individual items we will either see the the new value or the old value – this is to simplify the discussion
- Reads can be with expectation or without expectation (other than the core expectation of ‘success’ and valid ‘format’) – either the reader uses the value read (e.g. in an ‘if’ condition) or simply transfers it on without using it (e.g. as access API) – what the reader does with the value is independent of the issue of shared state.
- Writes can be with checks or not – either the writer doesn’t care about current state before writing the new state OR it does (so called compare and swap)
Therefore the standard model for un-secured parallel read-write of state involves all types of Reads (R) and Writes (W) to the same shared State (S). From the previous post we know the conditions that lead to issues with access to shared state.
In Figure 1 there is no way of knowing what is the final state of S given three parallel writes by Wa, Wb and Wc. The expectation is that S is either equal to 1, 2 or 3. Similarly, we cannot know what values are read by parallel readers Ra, Rb and Rc. Expectation again is that the values are either 1,2 or 3. But depending on S, Readers may see non-initialised values as well (for example if read happens before any writes).
The writer in this case treats the State as a black box. Therefore, there are no expectations of current value of S and writes are done in a fire and forget manner.
One way of improving this is to make writes that do compare and swap (CAS) – that is forming an expectation of existing state before changing it.
Figure 2 shows this at work. Two writers (W1 and W2) are writing to State S at the same time. We cannot be sure when they start and finish. Initial value of S = 100. Both are attempting to change the State of S from 100 to 200 and 300 respectively. Since these are overlapping writes we need to provide some sort of ordering to achieve consistency.
To provide ordering we write with expectation. To build the expectation each writer must perform a read first. This initial read gives the writer a foundation for their own write. Usually the subsequent compare and swap operation is an atomic one with support at the lowest levels (e.g. see the sun.misc.Unsafe and its use by concurrency API in Java for CAS).
W1 initiates the write by forming an expectation of the current state (100). While it attempts the CAS operation on S, W2 also initiates the write by forming an expectation. As W1’s CAS has not finished yet the current state is still 100. W2 is then able to slip its CAS ahead of W1 (due to say thread priority). That compares expectation to actual and then sets S to new value of 200. Now W1 catches up and issues a CAS request. CAS checks the current state with the expected state – this no longer matches as current state = 200 and expected is 100. The operation fails and W1 is free to retry (with say exponential back-off).
Putting Expectations to Work
Lets put forward a question at this stage:
What value does R2 get when it tries to read the S? As we can see the value of S changes from 100 to 200 while the read is taking place. For example: If we had instantaneous reads then R2 should get the value 100.
The answer is that it depends on the specific implementation.
We can use some sort of locking mechanism to coordinate between reads and writes. This would solve the parallel access issue by making sure there is only one reader or write at a given time. But this is not efficient as we are blocking parallel reads as well.
The main issue is that we don’t want to lock large chunks (e.g. a full document or a full row) but we may have to as we don’t want partial state updates to be available.
To be ‘atomic’ in case of a multi-part object (like a document or a row with multiple columns) means seeing all the changes part of a ‘write transaction’ reflected or none at all.
For example if we have a document that contains details of a game player with: ‘games played’, ‘games lost’, ‘games won’, ‘no results’. Then if we don’t provide atomic behaviour when updating after a game we could allow reads that don’t make sense (e.g. when ‘games played’ is different from the sum of ‘games lost’, ‘games won’ and ‘no results’).
As we spread the scope of the transaction the atomic update guarantee requires locks on more objects.
The other solution (rather than object wide locks) is to use a pointer to the object from the ‘table’ or ‘index’. Here an object is immutable. New versions require creation of new objects and updating a reference to point to the new version.
A single reference value can be easier to update atomically (e.g. using Compare and Swap) than a whole object. Therefore there is no need to block parallel reads and writes if we can guarantee that at any time the pointer to the object will be updated atomically. In this case, reads will see either the new value or the old value.
Furthermore what if there are relationships between entities that need to be updated simultaneously (a group of interrelated objects being updated simultaneously) ? For example between game result data and player data or money sender and receiver?
In the above case we need to guarantee that all related pointer values (as required by the transaction) will be updated atomically. This is lot harder to achieve.
One way this could be done is by creating a duplicate set of inter-related objects with the newer versions. This allows reads to happen (writes are blocked so that we don’t have multiple competing new versions) while the updated versions are being created. The problem will happen when its time to switch all of the related objects to the newer version without exposing a mix of old and new versions.
For this the traditional solution is to lock all connections (for reads or writes) while the version references are updated. Practically this should be a quick operation as the new structures already exist. The old versions can then be removed permanently (usually this is a compaction procedure that releases storage space).