Deathbob
Deathbob

Reputation: 1087

Generating a Unique ID in c++

What is the best way to generate a Unique ID from two (or more) short ints in C++? I am trying to uniquely identify vertices in a graph. The vertices contain two to four short ints as data, and ideally the ID would be some kind of a hash of them. Prefer portability and uniqueness over speed or ease.

There are a lot of great answers here, I will be trying them all tonight to see what fits my problem the best. A few more words on what I'm doing.

The graph is a collection of samples from an audio file. I use the graph as a Markov Chain to generate a new audio file from the old file. Since each vertex stores a few samples and points to another sample, and the samples are all short ints, it seemed natural to generate an ID from the data. Combining them into a long long sounds good, but maybe something as simple as just a 0 1 2 3 generateID is all I need. not sure how much space is necessary to guarantee uniqueness, if each vertex stores 2 16 bit samples, there are 2^32 possible combinations correct? and so if each vertex stores 4 samples, there are 2^64 possible combinations?

Library and platform specific solutions not really relevant to this question. I don't want anyone else who might compile my program to have to download additional libraries or change the code to suit their OS.

Upvotes: 9

Views: 34958

Answers (11)

Mubashar M
Mubashar M

Reputation: 68

Implementing your own hashing can be tedious and prone to some issues which are hard to debug and resolve when you have rolled out or partially rolled out your system. A much better implementation for Unique Ids is already present in windows API. You can see more details here;

https://learn.microsoft.com/en-us/windows/win32/api/guiddef/ns-guiddef-guid

Upvotes: 0

Arslan Tariq
Arslan Tariq

Reputation: 11

Try using this:

int generateID()
{
    static int s_itemID{ 0 };
    return s_itemID++; // makes copy of s_itemID,
                         increments the real s_itemID, 
                         then returns the value in the copy
}

This from here.

Upvotes: 0

Doug T.
Doug T.

Reputation: 65589

A simple solution is to use a 64 bit integer where the lower 16 bits is the first vertex coordinate, next 16 bits is the second, and so on. This will be unique for all your vertices, though not very compact.

So here's some half-assed code to do this. Hopefully I got the casts right.

uint64_t generateId( uint16_t v1, uint16_t v2, uint16_t v3, uint16_t v4)
{ 
   uint64_t id;
   id = v1 | (((uint64_t)v2) << 16) | (((uint64_t)v3) << 32) | (((uint64_t)v4) << 48);
   return id;
}

Optionally this could be done with a union (great idea from Leon Timmermans, see comment). Very clean this way:

struct vertex
{
    uint16_t v1;
    uint16_t v2;
    uint16_t v3;
    uint16_t v4;
};

union vertexWithId
{
    vertex v;
    uint64_t id;
};

int main()
{
    vertexWithId vWithId;
    // Setup your vertices
    vWithId.v.v1 = 2;
    vWithId.v.v2 = 5;

    // Your id is automatically setup for you!
    std::cout << "Id is " << vWithId.id << std::endl;
    return 0;
}

Upvotes: 5

Fire Lancer
Fire Lancer

Reputation: 30105

Well the only way to guarantee the ID is unique, is to make have more id combinations than what your gettings ids from

eg for 2 shorts (assuming 16bit), you should use a 32bit int

int ID = ((int)short1 << 16) | short2;

and for 4 shorts you would need a 64bit int, etc...

With basically anything else collisions (multiple things may get the same id) are pretty much guaranteed.

However a different approach (which I think would be better)to get ids would be to hand them out as vertices are inserted:

unsigned LastId = 0;//global

unsigned GetNewId(){return ++LastId;}

This also has the effect of allowing you to add more/different data to each vertex. However if you expect to create more than 2^32 vertices without resetting it, this is probably not the best method.

Upvotes: 0

bk1e
bk1e

Reputation: 24328

If you're building a hash table in which to store your vertices, I can think of a couple of ways to avoid collisions:

  1. Generate IDs directly from the input data without throwing any bits away, and use a hash table that is large enough to hold all possible IDs. With 64-bit IDs, the latter will be extremely problematic: you will have to use a table that is smaller than your range of IDs, therefore you will have to deal with collisions. Even with 32-bit IDs, you would need well over 4GB of RAM to pull this off without collisions.
  2. Generate IDs sequentially as you read in the vertices. Unfortunately, this makes it very expensive to search for previously read vertices in order to update their probabilities, since a sequential ID generator is not a hash function. If the amount of data used to construct the Markov chain is significantly smaller than the amount of data that the Markov chain is used to generate (or if they are both small), this may not be an issue.

Alternatively, you could use a hash table implementation that handles collisions for you (such as unordered_map/hash_map), and concentrate on the rest of your application.

Upvotes: 0

Jeroen Dirks
Jeroen Dirks

Reputation: 7877

Sometimes the simplest things works best.

Can you just add an id field to the Vertex object and assign it a number in order of construction?

static int sNextId = 0;
int getNextId() { return ++sNextId; }

Upvotes: 10

Peteris Krumins
Peteris Krumins

Reputation:

If you are on Windows, you could useCoCreateGUID API, on Linux you can use /proc/sys/kernel/random/uuid, you can also look at 'libuuid'.

Upvotes: 0

xtofl
xtofl

Reputation: 41509

The definition of the "ID" in the question isn't really clear: do you need to use it as a key for fast Vertex lookup? You could define a comparator for the std::map (see below for an example)

Do you need to be able to differentiate between two Vertex objects with the same coordinates (but different in another field)? Define some 'id factory' (cfr. the singleton pattern) that generates e.g. a sequence of ints, unrelated to the values of the Vertex objects. - Much in the way Fire Lancer suggests (but beware of thread-safety issues!)

In my opinion, two vertices with identical coordinates are identical. So why would you even need an extra ID?

As soon as you define a 'strict weak ordering' on this type, you can use it as a key in e.g. an std::map,

struct Vertex {
  typedef short int Value;
  Value v1, v2;

  bool operator<( const Vertex& other ) const {
    return v1 < other.v1 || ( v1 == other.v1 && v2 < other.v2 ) ;
};

Vertex x1 = { 1, 2 };
Vertex x2 = { 1, 3 };
Vertex y1 = { 1, 2 }; // too!

typedef std::set<Vertex> t_vertices;

t_vertices vertices;
vertices.insert( x1 );
vertices.insert( x2 );
vertices.insert( y1 ); // won't do a thing since { 1, 2 } is already in the set.

typedef std::map<Vertex, int> t_vertex_to_counter;
t_vertex_to_counter count;
count[ x1 ]++;
assert( count[x1] == 1 );
assert( count[y1] == 1 );
count[ x2 ]++;
count[ y1 ]++; 
assert( count[x1] == 2 );
assert( count[y1] == 2 );

Upvotes: 0

David Dolson
David Dolson

Reputation: 310

If you prefer the portability, then boost::tuple is nice:

You would want a tuple of 4 items:

typedef boost::tuple<uint16,uint16,uint16,uint16> VertexID;

You can assign like this:

VertexID id = boost::make_tuple(1,2,3,4);

The boost tuple already has support for comparison, equality, etc., so it is easy to use in containers and algorithms.

Upvotes: 0

basszero
basszero

Reputation: 30004

off the cuff I'd say use prime numbers,

id = 3 * value1 + 5 * value2 + .... + somePrime * valueN

Make sure you don't overflow your id space (long? long long?). Since you've got a fixed number of values just crap some random primes. Don't bother generating them, there are enough available in lists to get you going for awhile.

I'm a little sketchy on the proof though, maybe someone more mathmatical can hook me up. Probably has something to do with unique prime factorization of a number.

Upvotes: -1

sean
sean

Reputation:

use a long long so you can store all 4 possibilities, then bitshift each short:

((long long)shortNumberX) << 0, 4, 8, or 12

make sure you cast before shifting, or your data could drop off the end.

Edit: forgot to add, you should OR them together.

Upvotes: 0

Related Questions