Reputation: 29228
What is the difference between linearizability and serializability (in the context of Java)? Can you please explain the difference between these with an example or provide a good reference?
Upvotes: 92
Views: 31617
Reputation: 4300
A good way to understand this is to look at this problem from a database standpoint. (I know you asked for a context in Java, sorry)
Assuming you are a database. You accept multiple transactions operating on the same object concurrently but you only have one single disk arm.
When you receive multiple transactions at the same time (meaning those transactions overlap), you will have to re-order those operations within transactions in some way so your poor disk arm can handle them one-by-one.
Example: We have two transactions, T1 and T2, and a key foo
with the initial value Y
. Timeline:
T1 starts: read key foo.
T2 starts: write key foo with value X.
T1 continues: read key foo again.
T1 ends.
T2 ends.
In a non-serializable system, T1 might have two different readings of foo during the same transaction, which is inconsistent and undesirable.
The term comes from "serial" -> one by one.
You have the ability to re-arrange those transactions to make it look like they happen sequentially (one by one). In order to do so, you enforce some restrictions or conflict prevention mechanisms such as locking. So even if two transactions arrive at the same time, they are executed one-by-one.
In this case, you might decide to re-arrange T1 -> T2, so T1 will read foo
as Y
. Or you can arrange T2 -> T1, then T1 will read foo
as X
.
From a transaction's perspective, both transactions are ACID-compliant.
The term comes from "linear" -> in a line (implying an order is in place).
Not only do you need to do what serialization requires, but you also need to ensure that operations appear to occur instantaneously at some point between their invocation and completion, preserving the real-time order of transactions. As you might have noticed, the real-time order is the key. In order to claim that you are linearizable, you must find a linearization point for every transaction and then order them according to these points.
For example, you might declare that the end of the transaction is the linearization point.
Asking a DB to decide a linearization point can be challenging, as the linearization point is often externally defined by the user.
Therefore, another name for "linearizability" is "external consistency," as it means being consistent with some external belief of real-time order.
Going back to the example, if the external user agrees to consider the end of the transaction as the linearization point, then a linearizable system will ensure: T1 reads foo
as Y
before T2 writes foo
as X
.
Upvotes: 2
Reputation: 113
Imagine you have your distributed system running on 3 machines (3 replicas). We simply want to write and read value of key "A".
Linearizable
In this model, our system will work like this - As soon as write(A) is done, we can read the latest value from any of the three replicas. Meaning, once an action (write/read) is done, its impact is visible to all future actions across the whole system (Like an atomic operation).
Here, the order of operation doesn't matter. So, in reality, we want to do write and read, but the system can do read and write and its fine and will still be Linearizable. Paxos/Raft algorithms provide this for a transaction for majority nodes.
Serializable
In this model, all the replicas (machines) will process actions/events in same order. So, if one of the replica does write(A) and read(A), other replicas will do in same order.
Here, it's possible to get stale data from other replicas. Let's say - replica_1 does write(A). Read(A) from replica_1 will return latest value, but if we read from replica_2, we may get old value. The only guarantee is whenever (no time limit) replica_2 will process in same order - write and then read. Zookeeper provides this.
This is a good blog to learn more about this - http://www.bailis.org/blog/linearizability-versus-serializability/
Let me know if my understanding is incorrect. Happy to learn and grow !
Upvotes: 0
Reputation: 22914
The central distinction between the two is that serializability is a global property; a property of an entire history of operations/transactions. Linearizability is a local property; a property of a single operation/transaction. Another distinction is that linearizability includes a notion of real-time, which serializability does not: the linearization point of an operation must lie between its invocation and response times. (See Tim Harris: Transactional Memory, 2ed. See Herlihy's slides from The Art of Multiprocessor Programming, the section on Linearizability, which are available here, for some examples and proofs.
Both properties are aimed at the same goal: sequential consistency. From Herlihy's paper:
Much work on databases and distributed systems uses serializability as the basic correctness condition for concurrent computations. In this model, a transaction is a thread of control that applies a finite sequence of primitive operations to a set of objects shared with other transactions. A history is serializable if it is equivalent to one in which transactions appear to execute sequentially, i.e., without interleaving. A (partial) precedence order can be defined on non-overlapping pairs of transactions in the obvious way. A history is strictly serializable if the transactions’ order in the sequential history is compatible with their precedence order...
...Linearizability can be viewed as a special case of strict serializability where transactions are restricted to consist of a single operation applied to a single object. Nevertheless, this single-operation restriction has far-reaching practical and formal consequences, giving linearizable computations a different flavor from their serializable counterparts. An immediate practical consequence is that concurrency control mechanisms appropriate for serializability are typically inappropriate for linearizability because they introduce unnecessary overhead and place unnecessary restrictions on concurrency.
Harris, Tim, James Larus, and Ravi Rajwar: Transactional Memory, 2ed. Synthesis Lectures on Computer Architecture. Morgn & Claypool, 2010. ISBN 9781608452354. URL: http://www.morganclaypool.com/doi/abs/10.2200/S00272ED1V01Y201006CAC011?journalCode=cac
Herlihy, Maurice and Jeanette Wing: Linearizability: A Correctness Condition for Concurrent Objects. ACM Trans. Prog. Lang. and Sys. Vol. 12, No. 3, July 1990, Pages 463-492. URL http://www.cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf
Papadimitriou, Christos: The Serializability of Concurrent Database Updates. Journal of the ACM Vol 26. No 4. October 1979, pp 631-653. URL http://publications.csail.mit.edu/lcs/pubs/pdf/MIT-LCS-TR-210.pdf
Herlihy, Maurice and Nir Shavit: The Art of Multiprocessor Programming. Elsevier, 2008. ISBN 978-0-12-370591-4. URL: http://www.elsevier.com/wps/find/bookdescription.cws_home/714091/description#description PPT slides on linearizability are here: http://pub.ist.ac.at/courses/ppc10/slides/Linearizability.pptx
Attiya, Hagit and Jennifer Welch: Sequential Consistency versus Linearizability. ACM Transactions on Computer Systems Vol. 12, No. 2, May 1994, Pages 91-122. URL http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.133.4969&rep=rep1&type=pdf
If you really care about this, read the paper that introduced the definitions. For linearizability, that's Linearizability: A Correctness Condition for Concurrent Objects, Herlihy and Wing. It's dense, but worth the attention. Note that in the software transactional memory community, it's an open question whether linearizability is the right goal / property to aim for.
Serializability is about the outcome of a collection of operations/the "system" being expressible as a specific ordering ("as if execution took place in a specific order...") of all the operations. Linearizability is a property of a single subset of operations in the system... an operation/set of operations are linearizable if they appear to the other operations as if they occurred at a specific instant in (logical) time with respect to the others. The canonical paper here is Papadimitriou, The Serializability of Concurrent Database Updates.
Think "atomic operation" when you're thinking about "linearizable." A (set of) operations are linearizable when they (appear to) occur atomically with respect to other parts of the system. A common formulation is "provide the illusion that each operation takes effect instantaneously between its invocation and response." The formulation of linearizability is due to Herlihy, which emphasizes that this is a local property, vs. other kinds of sequential consistency properties like "serializability" which are global.
Upvotes: 115
Reputation: 27727
There's a great explanation by Peter Bailis here:
"In plain English, under linearizability, writes should appear to be instantaneous. Imprecisely, once a write completes, all later reads (where “later” is defined by wall-clock start time) should return the value of that write or the value of a later write. Once a read returns a particular value, all later reads should return that value or the value of a later write."
"Serializability is a guarantee about transactions, or groups of one or more operations over one or more objects. It guarantees that the execution of a set of transactions (usually containing read and write operations) over multiple items is equivalent to some serial execution (total ordering) of the transactions."
Upvotes: 37
Reputation: 719386
See @andersoj's answer for a clear description of the difference between serializability and linearizability.
This is only indirectly relevant to Java concurrent programming. In general, a concurrent Java program does not need to have either a serializable or linearizable history. In the cases that do, serializability is generally sufficient for a program (Java or otherwise) for "correctness", though particular problems could require the stronger linearizability property. But either way, it is the problem that determines the correctness requirements, not Java.
Upvotes: 3
Reputation: 6317
Check Wikipedia's explanation:
http://en.wikipedia.org/wiki/Linearizability#Linearizability_versus_serializability
It can also be a bit confusing because we also use the term serialize to refer to converting a class into data stream for storage or network transmission.
Upvotes: 2