nevsv
nevsv

Reputation: 2466

Cassandra Insert and Update differences in performance

There is a difference in Cassandra between rows created by update and by insert, and it affects behavior of ttl and rows with "all nulls" non-key columns.

Except this behavior, does this have any effect on performance during the creation/deletion/selection of such row?

Link to JIRA describing this behavior:

https://issues.apache.org/jira/browse/CASSANDRA-8430

Upvotes: 1

Views: 3131

Answers (1)

nevsv
nevsv

Reputation: 2466

1. "Execution plan": Executing same query (select by primary key), source_elapsed column:

Create as Insert:

2266,1768,1672,3302,3324,1422,1623,3833,3933,3519,4166. Avg: 2803

Create as Update:

1621,3498,4769,3680,3905,1781,4215,3764,3747,3460,1987. Avg: 3312

Maybe it looks like Update is a bit slower, but this is not really consistent, and i believe that with higher number of executions they should be same.

2. Storage:

Row created as Insert:

[user1]@184 Row[info=[ts=1486368137507000 ttl=3600, let=1486371737] ]: 2017-01-01 14:00Z, bla, 5,2 | [blu=77777 ts=1486368137507000 ttl=3600 ldt=1486371737], [ble=0 ts=1486368137507000 ttl=3600 ldt=1486371737]

Row created as Update:

[user30]@122 Row[info=[ts=-9223372036854775808] ]: 2017-01-01 14:00Z, bla, 5,2 | [blu=777 ts=1486368139142000 ttl=3600 ldt=1486371739], [ble=1 ts=1486368139142000 ttl=3600 ldt=1486371739]

I assume that sstabledump is indeed representing data as it saved in file. The only difference here that row created as insert is generated with ttl and let columns on the row level (and ts is set to the time created) - this is the cause rows with all null non-key columns are selectable with create as insert and not selectable with create as update. So rows created with insert will use several bytes more storage, that is all the difference here.

3. Tombstones:

Created as Insert:

[user1]@48 Row[info=[ts=-9223372036854775808] ]: 2017-01-01 14:00Z, bla, 5,2 | [blu= ts=1486368407044000 ldt=1486368406], [ble= ts=1486368407044000 ldt=1486368406]

Created as Update:

[user30]@0 Row[info=[ts=-9223372036854775808] ]: 2017-01-01 14:00Z, bla, 5,2 | [blu= ts=1486368403444000 ldt=1486368403], [ble= ts=1486368403444000 ldt=1486368403]

As expected, tombstones looks exactly the same for both creates.

Summary:

From my observation there is no real difference in performance between two types of row creation. I will be happy to see other tests/observations/source code reviews here.

Upvotes: 7

Related Questions