relot
relot

Reputation: 701

Save a Hash Value as a Tensor in pytorch

I have a dataset that contains identifiers that are saved as string.

I want to create a neural net that gets amongst other things these identifiers as labels and then checks if two identifier are exactly the same. If they are the same then I want to increase the loss if the network predicts wrong values.

As an example an identifier looks like this ec2c1cc2410a4e259aa9c12756e1d6e

It's always 32 values and uses hexadecimal characters (0-9a-f).

I want to work with this value in pytorch and save it as a tensor but I get the following problem

decimal_identifier = int(string_id, 16)
tensor_id = torch.ToTensor(decimal_identifier)

RuntimeError: Overflow when unpacking long

So I can't convert the value into a decimal because the values are too big. Any idea how I could fix this? I know that it's always 32 chars but I haven't found a char tensor in pytorch.

How can I feed this unique identifier in my neural net?

Upvotes: 2

Views: 1450

Answers (1)

Shai
Shai

Reputation: 114906

The problem is that int(string_id, 16) converts your 32 char long hash into a single integer. This is really a very VERY large number.
You can, instead, convert it to an array:

tensor_id = torch.tensor([int(c, 16) for c in string_id])

Resulting with (in your example):

tensor([14, 12,  2, 12,  1, 12, 12,  2,  4,  1,  0, 10,  4, 14,  2,  5,  9, 10,
        10,  9, 12,  1,  2,  7,  5,  6, 14,  1, 13,  6, 14])

You can also group the hex digits to 8 at a time (for int64 tensor):

torch.tensor([int(string_id[i:i+8], 16) for i in range(0, len(string_id), 8)], dtype=torch.int64)

Upvotes: 1

Related Questions