Reputation: 180
I'm designing a pipeline with the following functionality:
StrId
(String)KV(StrId, IntId>
where IntId
is a unique integerStrId
in that mapping:
StrId
is found, return corresponding IntId
StrId
is not found, generate a new IntId
sequentially, add it to the mapping and also write it to BigtableIntId
downstreamI'm wondering whether the state approach would fit my needs here, and whether Bigtable is the right storage technology to use? The mapping between StrId
and IntId
would have to be persisted across all workers in order to keep IntIds
unique.
Also, any links to code examples would be greatly appreciated. I'm aware of this Stackoverflow Question and this blog post.
(For the downstream calculations, I need integer Ids, so there's no way around that)
Upvotes: 1
Views: 231
Reputation: 2711
This sounds very much like what OpenTSDB does to manage strings in its tsdb-uid table. That process requires a combination of increment (aka ReadModifyWrite) to get a unique id (which is an int64 / long), and a CheckAndMutate to ensure that you only have one unique mapping. It's a more difficult process than what you get out of SQL systems.
That said, Cloud Bigtable is not ideal for managing small tables like the uid table (i.e. less than a couple of GB). If you're using Cloud Bigtable to store lots of data, you can consider using Cloud Bigtable for the uid table as well. However, I would still suggest also looking at a SQL alternative for that functionality.
Upvotes: 1