What is the best way to read data from Cassandra in parallel?

Question

I'm new to Cassandra and I'm trying to figure out how I should store data in order to be able to perform fast reads in parallel. I have read that partitioning data can give performance issues? Is it possible to read data from Cassandra tables in the same partition in parallel?

Aaron · Accepted Answer

DataStax's Oliver Michallat has a good blog post which discusses this:

Asynchronous queries with the Java Driver

In that article, he describes how to code in-parallel queries to solve the issues associated with multi-partition-key queries.

The example he uses, is instead of running a single query (from Java) for something like this:

SELECT * FROM users WHERE id IN (
    e6af74a8-4711-4609-a94f-2cbfab9695e5,
    281336f4-2a52-4535-847c-11a4d3682ec1);

A better way is to use an async "future" like this:

Future> future = ResultSets.queryAllAsList(session,
    "SELECT * FROM users WHERE id = ?",
      UUID.fromString("e6af74a8-4711-4609-a94f-2cbfab9695e5"),
      UUID.fromString("281336f4-2a52-4535-847c-11a4d3682ec1")
);

for (ResultSet rs : future.get()) {
    ... // here is where you process the result set    
}

As for querying data from within the same partition, of course you can. I assume that you mean with differing clustering keys (otherwise there would be no point), and that should work in a similar way to what is listed above.

What is the best way to read data from Cassandra in parallel?

Answers (1)

Related Questions