Cassandra read process

Question

Say, I have a table, with 4 columns. I write some data in it. If I try to read the data, the procedure goes like this. I want to understand a specific scenario, in which, all the columns(of the row which I'm trying to read) are present in the memtable. Will SSTables, be checked for data for such a row? I think, that in this case, there's no need to check the SSTables as obviously the data present in the memtable will be the latest copy. Therefore, reads in such cases, should be faster as compared to those when memtable either doesn't have the row, or contains only partial data.

I created a table(user_data), and entered some data which resulted in the creation of 2 SSTables. After this, I inserted a new row. I checked in the data directory and made sure that the SSTable count was still 2. This means that the new data which I entered is lying in the Memtable. I set the 'tracing on' in cqlsh and then selected the same row. Given below is the output:

Tracing session: de2e8ce0-cf1e-11e6-9318-a131a78ce29a

 activity                                                                                     | timestamp                  | source        | source_elapsed | client
----------------------------------------------------------------------------------------------+----------------------------+---------------+----------------+---------------
                                                                           Execute CQL3 query | 2016-12-31 11:33:36.494000 | 172.16.129.67 |              0 | 172.16.129.67
 Parsing select address,age from user_data where name='Kishan'; [Native-Transport-Requests-1] | 2016-12-31 11:33:36.495000 | 172.16.129.67 |            182 | 172.16.129.67
                                            Preparing statement [Native-Transport-Requests-1] | 2016-12-31 11:33:36.495000 | 172.16.129.67 |            340 | 172.16.129.67
                                  Executing single-partition query on user_data [ReadStage-2] | 2016-12-31 11:33:36.495000 | 172.16.129.67 |            693 | 172.16.129.67
                                                   Acquiring sstable references [ReadStage-2] | 2016-12-31 11:33:36.495000 | 172.16.129.67 |            765 | 172.16.129.67
                                                      Merging memtable contents [ReadStage-2] | 2016-12-31 11:33:36.495000 | 172.16.129.67 |            821 | 172.16.129.67
                                         Read 1 live rows and 0 tombstone cells [ReadStage-2] | 2016-12-31 11:33:36.495000 | 172.16.129.67 |           1028 | 172.16.129.67
                                                                             Request complete | 2016-12-31 11:33:36.495225 | 172.16.129.67 |           1225 | 172.16.129.67

I don't understand the meaning of "Acquiring sstable references" here. As the complete data was lying in the Memtable, therefore, as I understand, there's no need to check the SSTables. So, what exactly are these references for?

MD Ruhul Amin · Accepted Answer

all the columns(of the row which I'm trying to read) are present in the memtable.Will SSTables, be checked for data for such a row?

In this particular case, it will also check sstable data along memtable parallaly.

It will only go to sstable (actually first in row-cache, then bloom filter and then sstable), for that column, which is not present in memtable.

Edit:

To understand more about how read process are working here lets dive into the cassandra source. Let's start from the trace log and we will walk through the steps line by line:

Let's start from here:

Executing single-partition query on user_data [ReadStage-2]

Your select query is a single partition row query which is obvious. Cassandra just needs to read data from a single partition. Let's jump to the corresponding method and java-doc here, is self-explained:

/**
 * Queries both memtable and sstables to fetch the result of this query.
 * 
 * Please note that this method:
 *   1) does not check the row cache.
 *   2) does not apply the query limit, nor the row filter (and so ignore 2ndary indexes).
 *      Those are applied in {@link ReadCommand#executeLocally}.
 *   3) does not record some of the read metrics (latency, scanned cells histograms) nor
 *      throws TombstoneOverwhelmingException.
 * It is publicly exposed because there is a few places where that is exactly what we want,
 * but it should be used only where you know you don't need thoses things.
 * 
 * Also note that one must have created a {@code ReadExecutionController} on the queried table and we require it as
 * a parameter to enforce that fact, even though it's not explicitlly used by the method.
 */
public UnfilteredRowIterator queryMemtableAndDisk(ColumnFamilyStore cfs, ReadExecutionController executionController)
{
    assert executionController != null && executionController.validForReadOn(cfs);
    Tracing.trace("Executing single-partition query on {}", cfs.name);

    return queryMemtableAndDiskInternal(cfs);
}

From the avobe step we've found that for your query it will call queryMemtableAndDiskInternal(cfs); this method:

private UnfilteredRowIterator queryMemtableAndDiskInternal(ColumnFamilyStore cfs)
    {
        /*
         * We have 2 main strategies:
         *   1) We query memtables and sstables simulateneously. This is our most generic strategy and the one we use
         *      unless we have a names filter that we know we can optimize futher.
         *   2) If we have a name filter (so we query specific rows), we can make a bet: that all column for all queried row
         *      will have data in the most recent sstable(s), thus saving us from reading older ones. This does imply we
         *      have a way to guarantee we have all the data for what is queried, which is only possible for name queries
         *      and if we have neither non-frozen collections/UDTs nor counters (indeed, for a non-frozen collection or UDT,
         *      we can't guarantee an older sstable won't have some elements that weren't in the most recent sstables,
         *      and counters are intrinsically a collection of shards and so have the same problem).
         */
        if (clusteringIndexFilter() instanceof ClusteringIndexNamesFilter && !queriesMulticellType())
            return queryMemtableAndSSTablesInTimestampOrder(cfs, (ClusteringIndexNamesFilter)clusteringIndexFilter());
        ...
        ...

Here we've found our answer from this comment:

We have 2 main strategies: 1) We query memtables and sstables simulateneously. This is our most generic strategy and the one we use........

Cassandra is simultaniously querying on memtables and sstables.

After that if we jump into the queryMemtableAndSSTablesInTimestampOrder method we've found:

/**
 * Do a read by querying the memtable(s) first, and then each relevant sstables sequentially by order of the sstable
 * max timestamp.
 *
 * This is used for names query in the hope of only having to query the 1 or 2 most recent query and then knowing nothing
 * more recent could be in the older sstables (which we can only guarantee if we know exactly which row we queries, and if
 * no collection or counters are included).
 * This method assumes the filter is a {@code ClusteringIndexNamesFilter}.
 */
private UnfilteredRowIterator queryMemtableAndSSTablesInTimestampOrder(ColumnFamilyStore cfs, ClusteringIndexNamesFilter filter)
{
    Tracing.trace("Acquiring sstable references");
    ColumnFamilyStore.ViewFragment view = cfs.select(View.select(SSTableSet.LIVE, partitionKey()));

    ImmutableBTreePartition result = null;

    Tracing.trace("Merging memtable contents");
    .... // then it also looks into sstable on timestamp order.

From the above portion we've already found our last two tracing logs:

Acquiring sstable references [ReadStage-2]

Merging memtable contents [ReadStage-2]

Hope this helps.

Related links: Source: SinglePartitionReadCommand.java

Cassandra read process

Answers (1)

Related Questions