Alex Tbk
Alex Tbk

Reputation: 2104

Is UUID or Integer a good choice as partition key?

Two simple questions:

Will any of these options create "hot" partitions?

Thanks!

Upvotes: 6

Views: 4530

Answers (2)

Aaron
Aaron

Reputation: 57798

I usually tell folks not to use a UUID as a partition key for two simple reasons.

  1. UUIDs are designed to be unique, and thus have a high potential cardinality.
  2. While it does depend on your data model, think about how many rows you're going to have under each UUID, and then ask yourself if you really want to have to supply a full UUID on each and every query.

Again, it's all about the data model. From a DBA's perspective, they'll distribute well. But from a developer's perspective, it can really clamp-down your potential query patterns.

Ultimately, you want your primary key components to allow your model to A) distribute well and B) match your query patterns. If partitioning on a UUID gives you that, then great!

Upvotes: 3

Alex Ott
Alex Ott

Reputation: 87234

UUID is a good choice for partition key - it should be good distributed between cluster nodes. "Unique" integer is more tricky - some node need to be an authority for generation of this number, and this is hard to do in the distributed environment.

Regarding hot partition - this will depend on your data model. If you have other primary key components besides the partition key, yes - you may have this problem. For example, you generate a random UUID for sensor & starting to write a lot of data into it.

Upvotes: 4

Related Questions