Reputation: 701
I have a dataset that contains identifiers that are saved as string.
I want to create a neural net that gets amongst other things these identifiers as labels and then checks if two identifier are exactly the same. If they are the same then I want to increase the loss if the network predicts wrong values.
As an example an identifier looks like this ec2c1cc2410a4e259aa9c12756e1d6e
It's always 32 values and uses hexadecimal characters (0-9a-f).
I want to work with this value in pytorch
and save it as a tensor
but I get the following problem
decimal_identifier = int(string_id, 16)
tensor_id = torch.ToTensor(decimal_identifier)
RuntimeError: Overflow when unpacking long
So I can't convert the value into a decimal because the values are too big.
Any idea how I could fix this?
I know that it's always 32 chars but I haven't found a char tensor in pytorch
.
How can I feed this unique identifier in my neural net?
Upvotes: 2
Views: 1450
Reputation: 114906
The problem is that int(string_id, 16)
converts your 32 char long hash into a single integer. This is really a very VERY large number.
You can, instead, convert it to an array:
tensor_id = torch.tensor([int(c, 16) for c in string_id])
Resulting with (in your example):
tensor([14, 12, 2, 12, 1, 12, 12, 2, 4, 1, 0, 10, 4, 14, 2, 5, 9, 10, 10, 9, 12, 1, 2, 7, 5, 6, 14, 1, 13, 6, 14])
You can also group the hex digits to 8 at a time (for int64 tensor):
torch.tensor([int(string_id[i:i+8], 16) for i in range(0, len(string_id), 8)], dtype=torch.int64)
Upvotes: 1