AntonBoarf
AntonBoarf

Reputation: 1313

Cassandra, Java and MANY Async request : is this good?

I'm developping a Java application with Cassandra with my table :

id  | registration | name 
 1          1         xxx
 1          2         xxx
 1          3         xxx
 2          1         xxx
 2          2         xxx
...        ...        ...
...        ...        ...
100,000    34        xxx

My tables have very large amount of rows (more than 50,000,000). I have a myListIds of String id to iterate over. I could use :

SELECT * FROM table WHERE id IN (1,7,18, 34,...,)
//image more than 10,000,000 numbers in 'IN'

But this is a bad pattern. So instead I'm using async request this way :

    List<ResultSetFuture> futures = new ArrayList<>();
    Map<String, ResultSetFuture> map = new HashMap<>();
   // map : key = id & value = data from Cassandra

    for (String id : myListIds)
    {
        ResultSetFuture resultSetFuture = session.executeAsync(statement.bind(id));
        mapFutures.put(id, resultSetFuture);
    }

Then I will process my data with getUninterruptibly() method.

Here is my problems : I'm doing maybe more than 10,000,000 Casandra request (one request for each 'id'). And I'm putting all these results inside a Map.

Can this cause heap memory error ? What's the best way to deal with that ?

Thank you

Upvotes: 3

Views: 694

Answers (2)

Mikhail Baksheev
Mikhail Baksheev

Reputation: 1414

I see the following problems with your code:

  1. Overloaded Cassandra cluster, it won't be able to process so many async requests, and you requests will be failed with NoHostAvailableException
  2. Overloaded cassadra driver, your client app will fails with IO exceptions, because system will not be able process so many async requests.(see details about connection tuning https://docs.datastax.com/en/developer/java-driver/3.1/manual/pooling/)
  3. And yes, memory issues are possible. It depends on the data size

Possible solution is limit number of async requests and process data by chunks.(E.g see this answer )

Upvotes: 0

Chunker
Chunker

Reputation: 223

Note: your question is "is this a good design pattern".

If you are having to perform 10,000,000 cassandra data requests then you have structured your data incorrectly. Ultimately you should design your database from the ground up so that you only ever have to perform 1-2 fetches.

Now, granted, if you have 5000 cassandra nodes this might not be a huge problem(it probably still is) but it still reeks of bad database design. I think the solution is to take a look at your schema.

Upvotes: 5

Related Questions