Database storage with fast elementary queries and counts for 500M items

Question

I'm looking for a storage solution that is (only) fast for elementary queries and counts on a large but simple data set; there are no other constraints.

The data consists of 500 million items shaped like the following:

{
    a: VARCHAR(100)
    b: VARCHAR(100)
    c: VARCHAR(100)
}

I want to solve the following elementary queries fast:
“give me page X of items that where a = X AND b = Y AND and C = Z”
Any of the three constraints are optional, e.g, a = X is also a valid query.
So I only want to query on field equality within a single row.
(Other queries are not important. Joins will never be necessary.)

Additionally, I want to have fast count estimates for each of those queries.
For instance, I'd like to know that there are approximately 2 million items where a = X.

What storage solutions / configurations are the best match and why?

Update: it might also be good to know that the underlying data rarely changes and is added all at once in batches of ± 20 million items.

I currently tried ingesting the data to a MySQL database, with indexes on each of the columns. I got as far as 100 million items and the query performance is very good. However, the server has become extremely unstable and the import process halts randomly. I'm using LOAD DATA.

Database storage with fast elementary queries and counts for 500M items

Answers (1)

Related Questions