twds
twds

Reputation: 343

Hbase multiple column families vs multiple tables

I'm developing a Hbase storage for data generated from different sources. Usually columns from the same source are more likely to be retrieved at the same time. The expected write/read ratio roughly range from 1/10 to 1/100 (depends on different sources).

So there're two choices for me:

Here're some of my understanding, please correct me if anything wrong.

Any suggestions or do I need to consider any other factors before make the decision? Are there any typical cases multiple-tables/multiple-column-families outperforms the other?

Thanks

Upvotes: 1

Views: 1281

Answers (1)

AdamSkywalker
AdamSkywalker

Reputation: 11609

Your points are correct, just follow the simple rule:

If data from different sources is related and has same keys or keys can be transformed to the same key, put it in the same table in different column families. You will get better scans and better data arrangement.

If data can't be stick together, put it to separate table. One big table will only cause problems: you'll have longer scans and most of the column families will be empty.

Upvotes: 1

Related Questions