Reputation: 69
How can I make a complex manipulation of a panda column into a new column? for example:
import pandas as pd
import ast
d = {'col1' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']),
'col2' : pd.Series(['[9, 10]', '[10, 11]', '[11, 12]', '[12,13]'],
index=['a', 'b', 'c', 'd'])
}
df = pd.DataFrame(d)
print(df)
So the last column is actually a string, but I want to convert it to a list.
I tried:
df['new'] = ast.literal_eval(df['col2')
which throws an error.
I have tried a lot of other things and couldn't get anything working.
I suppose there is another way to answer this question:
In a previous file, I created my df with a lists being the elements of the column and then saved to csv. When I opened the csv file, the lists are interpreted as strings. So another solution would be to save the original panda in a way that preserves the lists.
Upvotes: 3
Views: 647
Reputation: 294566
json.loads
works because your lists are valid json
. You can use json
already imported in pandas
df.assign(new=df.col2.apply(pd.io.json.loads))
col1 col2 new
a 1 [9, 10] [9, 10]
b 2 [10, 11] [10, 11]
c 3 [11, 12] [11, 12]
d 4 [12,13] [12, 13]
print(type(df.assign(new=df.col2.apply(pd.io.json.loads)).iloc[0, -1]))
<class 'list'>
For whatever reason, json
parsing seems faster than literal_eval
%timeit df.assign(new=df.col2.apply(pd.io.json.loads))
%timeit df.assign(new=df.col2.apply(literal_eval))
%timeit df.assign(new=[ast.literal_eval(x) for x in df['col2']])
small data
1000 loops, best of 3: 410 µs per loop
1000 loops, best of 3: 468 µs per loop
1000 loops, best of 3: 397 µs per loop
large data
df = pd.concat([df] * 10000, ignore_index=True)
100 loops, best of 3: 17.9 ms per loop
1 loop, best of 3: 333 ms per loop
1 loop, best of 3: 331 ms per loop
Upvotes: 3