Reputation: 1553

Suggest a database for key with multiple values , highly scalable

We have data with key-multipleValues. Each key can have around 500 values (each value will be around 200-300 chars) and the number of such keys will be around 10 million. Major operation is to check for a value given a key.

I've been using mysql for long time where i've got 2 options: one row for each keyvalue, one row for each key with all values in a text field.But these does not seem efficient to me as the first model has lot of rows,redundancies and second model text field will become very large .

I am considering using nosql database for this purpose, i've used mongodb before and i dont think it is suitable for my current case. keyvalue based or column family based nosql db would be better.It need not be distributed.Someone who used riak,redis,cassandra etc pls share your thoughts.

Thanks

Upvotes: 5

Answers (4)

Nishant

Reputation: 55856

From your description, it seems some sort of Key-value store will be better for you comparing relational DB.

The data itself seem to be a non-relational, why store in a relational storage? It seems valid to use something like Cassandra.

I think a typical data-structure for this data to store will be a column family, with Key as Row-key and Columns as value.

MyDATA: (ColumnFamily)
  RowKey=>Key
    Column1=>val1
    Column2=>val2
    ...
    ...
    ColumnN=valN

The data would look like (JSON notation):

MyDATA (CF){
    [
      {key1:[{val1-1:'', timestamp}, {val1-2:'', timestamp}, .., {val1-500:'', timestamp}]},
      {key2:[{val2-1:'', timestamp}, {val2-2:'', timestamp}, .., {val2-500:'', timestamp}]},
      ...
      ...
    ]
}

Hopefully this helps.

Upvotes: 2

Joshua Martell

Reputation: 7212

I don't think this is beyond the scale of MySQL on a single machine. You'll need to tune inserts or it'll take forever to load. You might also consider compressing your values using COMPRESS() or in your app directly. Might save you 50% or so.

Redis is basically an in-memory database, so it's probably out. Riak might be a decent choice or HBase or Cassandra.

Upvotes: 0

jbellis

Reputation: 19377

This does map straightforwardly to Cassandra. Row key will be your model key, and your model values will be column names (yes, names) in Cassandra. You can leave the Cassandra column value empty, or add metadata there such as timestamp if that would be useful.

Upvotes: 2

Kerrek SB

Reputation: 476950

Try the direct, normalized approach: One table with this schema:

id (primary key)
key
value

You have one row for every key->value relation

Add an index for each column, and lookup should be reasonably efficient. Have you profiled any of this to exhibit a bottleneck?

Upvotes: 2

Suggest a database for key with multiple values , highly scalable

Answers (4)

Related Questions