David Parks
David Parks

Reputation: 32081

Hadoop: How to create an auto-increment id

I need the SQL equivalent of an AUTO_INCREMENT id in hadoop.

When my reduce task identifies a new item, those items needs a unique ID assigned.

Upvotes: 3

Views: 2718

Answers (1)

Ray Toal
Ray Toal

Reputation: 88428

To perform distributed id generation you can either just generate uuids or use functionality found in Apache Zookeeper, which can do distributed coordination on Hadoop clusters. Disclaimer: I have never used Zookeeper, so I don't know if you can really (even theoretically) get a global contiguous set of ids, which is what the question seems to be asking.

Generating UUIDs does have a cost, though; they take some time to generate.

For good general information on distributed ID generation, see this Stack Overflow question.

Upvotes: 2

Related Questions