Transaction Phenomena – Part 3: The Phantom Read

March 29, 2010

Transaction Phenomena – Part 3: The Phantom Read

Filed under: SQLServerPedia Syndication,Technical Articles — Michael J. Swart @ 11:57 am

SERIES: Concurrent Transaction Phenomena

In Books Online (BOL), Microsoft describes different kinds of transaction isolation levels in terms of phenomena that can occur during concurrent transactions. Specifically, they mention three kinds of phenomena: Dirty Reads, Non-Repeatable Reads, and Phantom Reads. You may have heard of these before, but correct me if I’m wrong, I just can’t find a good definition anywhere on BOL.

And that’s too bad, because these phenomena don’t just help describe the different transaction isolation levels. The phenomena actually define these levels.

These terms actually come from the ISO/ANSI standard for SQL-92. The terms themselves however do little to illuminate exactly the meaning of each. (What’s a phantom read?) When used by the ANSI/ISO standard, they mean something very specific and they’re key to understanding the different isolation levels.

In the next few days, I’d like to illustrate each phenomenon:

Part 1: The Dirty Read (reading tentative data)
Part 2: The Non-Repeatable Read (reading changed data)
Part 3: The Phantom Read (reading new data)
Part 4: Serializable vs. Snapshot

Part 3: The Phantom Read

Imagine your transaction retrieves rows filtered by some criteria. A phantom read occurs when your transaction retrieves rows using the same criteria and new rows are returned. In this case, either the new rows you see have been inserted or data has changed in existing rows such that these rows now meet the filter criteria. Either way these rows are new to you.

From ISO/ANSI: Process P1 reads the set of rows N that satisfy some search condition. Process P2 then executes statements that generate one or more rows that satisfy the search condition. If P1 repeats the query it obtains a different collection of rows.

Example

It’s cheesy, but I use the following example as a mnemonic. Imagine there’s a database of pop culture, and that transactions can last years: Then we have

use PopCultureDb
 
SELECT Name
FROM CarModels
WHERE Manufacturer = 'Rolls Royce'
-- over 1925
-- query would give new row: "The Phantom I"
 
SELECT Name
FROM TonyAwards
WHERE AwardName = 'Best Musical'
-- over 1988
-- query would give new row: "Phantom of the Opera"
 
SELECT Name
FROM Movies
WHERE SagaName = 'Star Wars'
-- over 1999
-- query would give new row "Star Wars: Episode I - The Phantom Menace"

In each example if a transaction lasted over the years indicated, the new rows are examples of phantom reads.

Phantom Reads vs. Non-Repeatable Reads

So what’s the difference between phantom reads and non-repeatable reads? This can cause confusion sometimes. Because they both have the property that a repeated query returns results that are different. But the distinction between phantom reads and non-repeatable reads lies in how the query results are different.

The Non-Repeatable Read is a phenomena specific to a read of a single row. When data has changed in this row, and the row is read again, the changed data is returned. This is a non-repeatable read.
So what about a change that affects multiple rows? This certainly seems like a phenomenon that applies to more than one row. But actually, I think of this as just a set of non-repeatable reads where each non-repeatable read still only affects one row.
The Phantom Read is a phenomenon that deals with queries that return sets. The thing that’s changing in a phantom read is not the data in the rows; it’s the set membership that has changed. *

Isolation Levels that Experience Phantom Reads

Just like non-repeatable reads, the levels READ COMMITTED and READ UNCOMMITTED can experience phantom reads. But REPEATABLE READ can experience phantom reads too. In fact the only isolation level that avoids phantom reads is SERIALIZABLE (and SNAPSHOT, but I’ll get to that in part 4).

*-For rows that have been deleted, if a transaction reads them (or rather fails to read them) it would seem that this is both a non-repeatable read and a phantom read. But for the purposes of the ISO/ANSI standard it is in fact considered a non-repeatable read.

-- Comments (8)

8 Comments »

Thanks a bunch, Michael…

Now I’ll never read another article about phantom reads without thinking of British cars or Andrew Lloyd Webber or JarJar Binks. 8^)

I’m enjoying the series… looking forward to the next installment.

–Brad

Comment by Brad Schulz — March 29, 2010 @ 12:10 pm
Thanks Brad, (I was sorely tempted to start this comment with “meesa” but then thought better of it).

Comment by Michael J. Swart — March 29, 2010 @ 12:14 pm
Hi Michael,

Great explanation, now i will never forget the difference between the Non-repeatable reads and Phantom reads.

Great work.

Comment by Ramkrishna — April 23, 2010 @ 11:24 am
Thanks! I’m glad to hear it.

Comment by Michael J. Swart — April 23, 2010 @ 11:27 am
Great article, especially the clarification part about Phantom Reads vs. Non-Repeatable Reads. I find so many articles on the same thing, but most of them seem to expect the reader to read between the lines.
Thanks for not doing that!

Comment by Hrish — January 15, 2012 @ 2:13 am
I’m sure it’s obvious, but what is the mnemonic?

Comment by William — January 8, 2014 @ 9:17 am
By mnemonic, I just mean that I when I think of this example, I remember that new rows are returned after some time is passed. And if I can remember that each of the new rows contains “Phantom” in some way, then I remember that this phenomenon is called a “Phantom Read”.

I just looked up the definition of mnemonic and I was worried that it was only for acronyms and other lexicographic tricks. Nope, itturns out I used the word correctly here (yay me!).

Comment by Michael J. Swart — January 8, 2014 @ 9:27 am
Great article! I had some troubles with understanding the exact meaning of the phenomena, and this article made it clear. Thank you 🙂

Comment by Jolanta — August 27, 2016 @ 9:03 am