mmvw
mmvw

Reputation: 69

Panda manipulation of one column into a new column

How can I make a complex manipulation of a panda column into a new column? for example:

import pandas as pd
import ast

d = {'col1' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']), 
     'col2' : pd.Series(['[9, 10]', '[10, 11]', '[11, 12]', '[12,13]'],
              index=['a', 'b', 'c', 'd'])
    }
df = pd.DataFrame(d)
print(df)

So the last column is actually a string, but I want to convert it to a list.

I tried:

df['new'] = ast.literal_eval(df['col2')

which throws an error.

I have tried a lot of other things and couldn't get anything working.

I suppose there is another way to answer this question:

In a previous file, I created my df with a lists being the elements of the column and then saved to csv. When I opened the csv file, the lists are interpreted as strings. So another solution would be to save the original panda in a way that preserves the lists.

Upvotes: 3

Views: 647

Answers (2)

piRSquared
piRSquared

Reputation: 294566

json.loads works because your lists are valid json. You can use json already imported in pandas

df.assign(new=df.col2.apply(pd.io.json.loads))

   col1      col2       new
a     1   [9, 10]   [9, 10]
b     2  [10, 11]  [10, 11]
c     3  [11, 12]  [11, 12]
d     4   [12,13]  [12, 13]

print(type(df.assign(new=df.col2.apply(pd.io.json.loads)).iloc[0, -1]))

<class 'list'>

For whatever reason, json parsing seems faster than literal_eval

%timeit df.assign(new=df.col2.apply(pd.io.json.loads))
%timeit df.assign(new=df.col2.apply(literal_eval))
%timeit df.assign(new=[ast.literal_eval(x) for x in df['col2']])

small data

1000 loops, best of 3: 410 µs per loop
1000 loops, best of 3: 468 µs per loop
1000 loops, best of 3: 397 µs per loop

large data

df = pd.concat([df] * 10000, ignore_index=True)

100 loops, best of 3: 17.9 ms per loop
1 loop, best of 3: 333 ms per loop
1 loop, best of 3: 331 ms per loop

Upvotes: 3

jezrael
jezrael

Reputation: 863751

Need apply or list comprehension:

import ast
df['new'] = df['col2'].apply(ast.literal_eval)

df['new'] = [ast.literal_eval(x) for x in df['col2']]

print(type(df.loc['a', 'new']))
<class 'list'>

Upvotes: 2

Related Questions