Reputation: 1403
I have a string column in dataframe like this:
ID col1
id1 AA's 2015:45,BB:96
id2 Jigga:91,OO:73,BB:34
I want to create a new dataframe out of this which can take the shape:
ID var1 var2 var3 var4
id1 45 96 0 0
id2 0 34 91 73
where var1=AA's 2015,var2=BB,var3=Jigga,var4=OO
I have stored all distinct values of string's first values in a list like this:
["AA's 2015","BB","Jigga","OO"]
I want to iterate through this list and for each value create a variable var[i] which will take up it's value from col1 for that particular ID.
I can use the for loop for iterating through the list. But how to lookup the value and put in var[i]?
Any ideas will be appreciated
Upvotes: 3
Views: 3074
Reputation: 879371
Use apply
to manipulate the strings into a pandas Series. The function passed to apply
will be called on each string. The returned values, Series, are then merged into a single DataFrame. apply
returns this DataFrame.
The DataFrame's column labels come from merging all the Series' indices. The merging also places the Series values in the appropriate columns, which thus yields the desired result:
import pandas as pd
df = pd.DataFrame({'ID': ['id1', 'id2'], 'col1': ["AA: 2015:45,BB:96", 'Jigga:91,OO:73,BB:34']})
result = df['col1'].apply(lambda x: pd.Series(
dict([
item for item in [
part.rsplit(':',1) for part in x.split(',')]
if len(item)>1 # remove items corresponding to empty strings
]))).fillna(0)
result = result.rename(columns={name:'var{}'.format(i) for i, name in
enumerate(result.columns, 1)})
result = pd.concat([df[['ID']], result], axis=1)
print(result)
yields
ID var1 var2 var3 var4
0 id1 45 96 0 0
1 id2 0 34 91 73
I learned this trick here.
Upvotes: 2