muni
muni

Reputation: 1403

Iterating through a list to create new columns in a dataframe

I have a string column in dataframe like this:

ID  col1
id1 AA's 2015:45,BB:96
id2 Jigga:91,OO:73,BB:34

I want to create a new dataframe out of this which can take the shape:

ID  var1    var2    var3    var4
id1 45      96      0       0
id2 0       34      91      73

where var1=AA's 2015,var2=BB,var3=Jigga,var4=OO

I have stored all distinct values of string's first values in a list like this:

["AA's 2015","BB","Jigga","OO"]

I want to iterate through this list and for each value create a variable var[i] which will take up it's value from col1 for that particular ID.

I can use the for loop for iterating through the list. But how to lookup the value and put in var[i]?

Any ideas will be appreciated

Upvotes: 3

Views: 3074

Answers (1)

unutbu
unutbu

Reputation: 879371

Use apply to manipulate the strings into a pandas Series. The function passed to apply will be called on each string. The returned values, Series, are then merged into a single DataFrame. applyreturns this DataFrame.

The DataFrame's column labels come from merging all the Series' indices. The merging also places the Series values in the appropriate columns, which thus yields the desired result:

import pandas as pd
df = pd.DataFrame({'ID': ['id1', 'id2'], 'col1': ["AA: 2015:45,BB:96", 'Jigga:91,OO:73,BB:34']})

result = df['col1'].apply(lambda x: pd.Series(
    dict([
        item for item in [
            part.rsplit(':',1) for part in x.split(',')] 
         if len(item)>1  # remove items corresponding to empty strings
    ]))).fillna(0)
result = result.rename(columns={name:'var{}'.format(i) for i, name in 
                                enumerate(result.columns, 1)})
result = pd.concat([df[['ID']], result], axis=1)
print(result)

yields

    ID var1 var2 var3 var4
0  id1   45   96    0    0
1  id2    0   34   91   73

I learned this trick here.

Upvotes: 2

Related Questions