Reputation: 29
I am currently using ClickHouse to store few billions of data each week. We use aggregated tables to fetch data, so far so good. Now there is a need to fetch a single row from this database.
ClickHouse is not meant for such a case, even though after applying some optimization recommended by ClickHouse single row select is still somehow slow (few seconds).
to clarify this little more, this table is indexed by columns a,b,c, and d and also partitioned monthly (The table has some more columns). A new service has to query this table whereas only knows a and b and z (a UUID column). However, the average response is between 3 and 10 seconds for 10 billion data.
I have an opportunity to add an extra data store layer so that I can store the data into an extra database for this need.
Now the actual question: What could be the best database for such a case where we only need to read a single row of billions of data?
P.S:
Upvotes: 0
Views: 296
Reputation: 13350
Cassandra?
You can use an additional table and a materialized view to emulate inverted index.
This additional table should be sorted by z and contain pk columns (a,b,c, d) from the main table.
Then query the main table like
select ... from main_table where (a,b,c,d) in
( select a,b,c,d from additional_table where z= ... )
and z = ...
additional_table can be automatically filled by the materialized view from the main_table.
Upvotes: 1