Reputation: 367
I have a dataframe (df) of the form:
name alias col3
mark david ['3109892828','[email protected]','123 main st']
john twixt ['5468392873','[email protected]','345 grand st']
What is a concise way to split col3 into new, named columns? (perhaps using lambda and apply)
Upvotes: 1
Views: 3251
Reputation: 367
Here's what I came up with. It includes a bit of scrubbing of the raw file, and a conversion to a dictionary.
import pandas as pd
with open('/path/to/file', 'rb') as f:
data = f.readlines()
data = map(lambda x: x.split('}'), data)
data_df = pd.DataFrame(data)
data_dfn = data_df.transpose()
data_new = data_dfn[0].map(lambda x: x.lstrip('[,{)').replace("'","").split(','))
s = pd.DataFrame(data_new)
d = dict(data_new)
D = pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in d.iteritems() ]))
D = D.transpose()
Upvotes: 0
Reputation: 394031
You could apply a join to the list elements to make a comma separated string and then call the vectorised str.split
with expand=True
to create the new columns:
In [12]:
df[['UserID', 'email', 'address']] = df['col3'].apply(','.join).str.split(expand=True)
df
Out[12]:
alias col3 name \
0 david [3109892828, [email protected], 123 main st] mark
1 twixt [5468392873, [email protected], 345 grand st] john
UserID email address
0 3109892828,[email protected],123 main st
1 5468392873,[email protected],345 grand st
A cleaner method would be to apply the pd.Series
ctor which will turn each list into a Series:
In [15]:
df[['UserID', 'email', 'address']] = df['col3'].apply(pd.Series)
df
Out[15]:
alias col3 name UserID \
0 david [3109892828, [email protected], 123 main st] mark 3109892828
1 twixt [5468392873, [email protected], 345 grand st] john 5468392873
email address
0 [email protected] 123 main st
1 [email protected] 345 grand st
Upvotes: 2