Tomat0
Tomat0

Reputation: 1

Converting a DataFrame column of characters to integer representation

I have a dataframe with a column of characters that I want to convert to integers. Some of the characters are multichar e.g. '\b'. Running apply using ord results in a TypeError.

I tried using ord() for conversion and it works fine run by itself, but throws a TypeError when I try to use it on the DataFrame:

ft_x['keyCode'].apply(lambda row : ord(row))

TypeError: ord() expected a character, but string of length 2 found

The TypeError is thrown when the character '\b' is reached.

Just doing this however works as expected:

x = '\b'
ord(x)

8

What am I missing here?

When run on the dataframe elements ord() is interpreting characters like '\b' as strings, instead of what they are - unicode representations of characters - backspace in the case of '\b'.

For reference I'm working on this dataset: https://ieee-dataport.org/open-access/emosurv-typing-biometric-keystroke-dynamics-dataset-emotion-labels-created-using using the FixedTextTypingDataset.csv I need the keyCode column as Integers.

print(ft_x.loc[ft_x['keyCode'].str.len() > 1, 'keyCode'].head(5))
33        \b
34    \u0010
35    \u0010
36    \u0010
37    \u0010
Name: keyCode, dtype: object

Upvotes: 0

Views: 651

Answers (2)

Tomat0
Tomat0

Reputation: 1

Converting the multi-character strings to bytes and then back to integers works.

def char_to_int(char):
    #ignore NaNs
    if type(char) == float:
      pass
    elif len(char) == 1:
        return int(ord(char))
    else:
        return int(int.from_bytes(char.encode(), byteorder='big'))

ft_x['keyCode'] = ft_x['keyCode'].apply(char_to_int)

Upvotes: 0

Corralien
Corralien

Reputation: 120519

You have to use a list inside apply to transform each character of the string:

ft_x['ord'] = (ft_x['keyCode'].str.encode('utf-8')
                              .str.decode('unicode-escape')
                              .map(ord))

# Output
  keyCode  ord
0       a   97
1      \b    8
2  \u0030   48

Upvotes: 1

Related Questions