Datacrawler
Datacrawler

Reputation: 2876

Dynamic dataframe column name in apply function

I am using the current dataframe:

df = pd.DataFrame({'columnA':[1111,1111,2222,3333,4444,4444,5555,6666],
                   'columnB':['AAAA','AAAA','BBBB','AAAA','BBBB','BBBB','AAAA','BBBB'],
                   'columnC':['one','two','one','one','one','sales','two','one'],
                   'NUM1':[1,3,5,7,1,0,4,5],
                   'NUM2':[5,3,6,9,2,4,1,1],
                   'W':list('aaabbbbb')})

and I am trying to use a dynamic column in the following code:

#First aggregate the data
d = {'columnB':'unique', 'columnC':'unique' }
df2 = df.groupby('columnA').agg(d)


#Convert list to string for each cell of the inventory field
mylist = ["columnB","columnC"]
for x in mylist:
    columnName = x
    #print("df2."+columnName+".apply(', '.join)")
    df2[columnName] = df2[columnName].apply(', '.join)

and it works fine in Jupyter. My issue is that it does not work when I run it on visualstudio. I am getting this error:

sequence item 0: expected str instance, float found

after print the dataframe's type I am getting this:

<class 'pandas.core.frame.DataFrame'>

Here is the full error message:

Traceback (most recent call last): File "stage1.py", line 112, in main() File "stage1.py", line 57, in main templateScenarios[columnName] = templateScenarios[columnName].apply(', '.join) File "/Users/apolo.siskos/anaconda3/lib/python3.6/site-packages/pandas/core/series.py", line 2355, in apply mapped = lib.map_infer(values, f, convert=convert_dtype) File "pandas/_libs/src/inference.pyx", line 1574, in pandas._libs.lib.map_infer TypeError: sequence item 0: expected str instance, float found

Upvotes: 1

Views: 1119

Answers (1)

jezrael
jezrael

Reputation: 862551

There is problem NaNs values, so is possible remove them by dropna and use custom function with join:

df = pd.DataFrame({'columnA':[1111,1111,2222,3333,4444,4444,5555,6666],
                   'columnB':[np.nan,np.nan,'BBBB','AAAA','BBBB','BBBB','AAAA','BBBB'],
                   'columnC':['one','two','one','one','one','sales','two','one'],
                   'NUM1':[1,3,5,7,1,0,4,5],
                   'NUM2':[5,3,6,9,2,4,1,1],
                   'W':list('aaabbbbb')})

f = lambda x: ', '.join(x.dropna().unique())
d = {'columnB': f, 'columnC':f}
df2 = df.groupby('columnA').agg(d)
print (df2)
        columnB     columnC
columnA                    
1111               one, two
2222       BBBB         one
3333       AAAA         one
4444       BBBB  one, sales
5555       AAAA         two
6666       BBBB         one

Upvotes: 1

Related Questions