Pierre
Pierre

Reputation: 394

Tombstone warning with SELECT LIMIT 1

CREATE TABLE test (
    ck INT, 
    pk INT, 
    PRIMARY KEY (ck, pk)
);

for (int i = 1; i < 10000; i++) {
    sessionRW.execute(QueryBuilder.insertInto("test").value("ck", 1).value("pk", i));
}

root@cqlsh:ks> select * from test limit 5;

 ck | pk
----+----
  1 |  1
  1 |  2
  1 |  3
  1 |  4
  1 |  5

(5 rows)


root@cqlsh:ks> delete from test where ck = 1;

root@cqlsh:ks> insert into test(ck,pk) values (1, 0); -- new minimal value
root@cqlsh:ks> select * from test limit 1;

 ck | pk
----+-------
  1 | 0

(1 rows)

WARN  11:37:39 Read 1 live and 9999 tombstoned cells in ks.test (see tombstone_warn_threshold). 1 columns was reque

Why when I do a SELECT with "LIMIT 1" I get the tombstone warning ?

The rows are order by pk ASC and the lower pk value of this table (0) is the first row and is not deleted.

I don't understand why cassandra keep scanning my table for other results (hence fetching a lot of tombstone) because the first row match and I specified I just want one row.

I could have understood the warning If I didn't specified LIMIT. But what's the point of scanning the whole table when first row match with LIMIT 1?

Upvotes: 2

Views: 1764

Answers (3)

Pierre
Pierre

Reputation: 394

OK so I think I found the answer, the answer is cassandra is doing one more lookup after limit 1 (like if you did limit 2).

Just insert one more row:

insert into test(ck,pk) values (1, 1);

and now select * from test limit 1; won't trigger a tombstone error.

However, if you do LIMIT 2, it will trigger a tombstone error even if we have 2 valid rows, first in the table order.

Why cassandra is doing (limit+1) lookup is the question.

Upvotes: 0

Jim Meyer
Jim Meyer

Reputation: 9475

When you do "select * from test limit 1;", Cassandra has to go to all the nodes and filter the entire table to find the first live row. It needs to stream the tombstones to the coordinator since other nodes may be out of sync and the limit 1 would match a row that had been deleted. You should be able to avoid this by specifying the query such that the tombstones wouldn't matter, such as "select * from test where ck=1 and pk < 1;"

Upvotes: 0

Roman Tumaykin
Roman Tumaykin

Reputation: 1931

Because the way cassandra stores data. The data is stored as a single wide row as columns even if it looks through cql like multiple rows. Therefore in order to get to the lastly inserted "row" it needs to read all tombstoned columns as well.

Below is an illustration

    | 1 | 2 | 3 |...|9999| 0 |
----+---+---+---+---+----+---+
ck=1| T | T | T | T |  T |   |

As you can see it is one row under a clustering key 1. I marked tombstoned columns (or rows if you prefer) with "T". Cassandra reads the entire row, and then in order to find first non-tombstoned column, it needs to cycle through all 9999 tombstoned ones.

Upvotes: 3

Related Questions