Reputation: 1
I have a dataframe with a column of characters that I want to convert to integers. Some of the characters are multichar e.g. '\b'. Running apply using ord results in a TypeError.
I tried using ord() for conversion and it works fine run by itself, but throws a TypeError when I try to use it on the DataFrame:
ft_x['keyCode'].apply(lambda row : ord(row))
TypeError: ord() expected a character, but string of length 2 found
The TypeError is thrown when the character '\b' is reached.
Just doing this however works as expected:
x = '\b'
ord(x)
8
What am I missing here?
When run on the dataframe elements ord() is interpreting characters like '\b' as strings, instead of what they are - unicode representations of characters - backspace in the case of '\b'.
For reference I'm working on this dataset: https://ieee-dataport.org/open-access/emosurv-typing-biometric-keystroke-dynamics-dataset-emotion-labels-created-using using the FixedTextTypingDataset.csv I need the keyCode column as Integers.
print(ft_x.loc[ft_x['keyCode'].str.len() > 1, 'keyCode'].head(5))
33 \b
34 \u0010
35 \u0010
36 \u0010
37 \u0010
Name: keyCode, dtype: object
Upvotes: 0
Views: 651
Reputation: 1
Converting the multi-character strings to bytes and then back to integers works.
def char_to_int(char):
#ignore NaNs
if type(char) == float:
pass
elif len(char) == 1:
return int(ord(char))
else:
return int(int.from_bytes(char.encode(), byteorder='big'))
ft_x['keyCode'] = ft_x['keyCode'].apply(char_to_int)
Upvotes: 0
Reputation: 120519
You have to use a list inside apply
to transform each character of the string:
ft_x['ord'] = (ft_x['keyCode'].str.encode('utf-8')
.str.decode('unicode-escape')
.map(ord))
# Output
keyCode ord
0 a 97
1 \b 8
2 \u0030 48
Upvotes: 1