slobokv83
slobokv83

Reputation: 173

Convert values in column from hex to binary in pandas data frame

I have one column in pandas data frame with hex values, for example:

Data
1A
2B
BB
FF
A7
78
CB

I want to convert hex values in binary, then from binary to take first 3 bits and finally convert 3 bits value in decimal.

Data column in binary will be:

Data
00011010
00101011
10111011
11111111
10100111
01111000
11001011

the first 3 bits:

Data
010
011
011
111
111
000
011

and finally the desired value in decimal:

Data
2
3
3
7
7
0
3

How to do this? I tried with bin() function, but it doesn't work with pandas data frames.

Upvotes: 2

Views: 5234

Answers (2)

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 477210

We can do this by a chain of actions:

  1. first we convert the hexadecimal number to an int with .apply(int, base=16);
  2. next we convert this to binary data, with .apply(bin);
  3. next we chunk off the first two characters with .str[2:];
  4. then we obtain the last three characters with .str[-3:]; and
  5. finally we again interpret these as ints, with .apply(int, base=2).

So:

>>> df.Data.apply(int, base=16).apply(bin).str[2:].str[-3:].apply(int, base=2)
0    2
1    3
2    3
3    7
4    7
5    0
6    3
Name: Data, dtype: int64

We can however use another strategy here:

  1. we first convert the hexadecimal number to an int; and
  2. then we apply a bitwise and with 0b111.

for example:

>>> df.Data.apply(int, base=16) & 0b111
0    2
1    3
2    3
3    7
4    7
5    0
6    3
Name: Data, dtype: int64

The second attempt is not only simpler, but faster as well, approximately by 66%:

>>> timeit(first_strategy, number=10000)
6.962630775000434
>>> timeit(second_strategy, number=10000)
2.330652763019316

for a dataframe that repeats the sample data 100 times, we get:

>>> timeit(first_strategy, number=10000)
17.603060900000855
>>> timeit(second_strategy, number=10000)
5.901462858979357

this is again 66% faster.

Upvotes: 6

Jon Clements
Jon Clements

Reputation: 142206

You can use:

df.Data.apply(lambda v: int(format(int(v, 16), '08b')[-3:], 2))

Which gives you:

0    2
1    3
2    3
3    7
4    7
5    0
6    3
Name: Data, dtype: int64

Those steps are:

  • Take your original data and convert it to decimal using int(number, 16) (base 16 is hex) (int('1A', 16) == 26)
  • Take that number and format it as a binary string format(number, '08b') gives you an character string of 0/1's zero filled on the left (format(26, '08b') == '00011010')
  • Take the last 3 characters of that string [-3:] ('010') and convert it to decimal with a base 2, int(binary_string[-3:], 2) gives you: 2

Upvotes: 2

Related Questions