Reputation: 101
I have this input dataframe:
Test1 Test2 Test3 Subject
0 45 NaN NaN Python
1 50 NaN NaN Python
2 NaN 30 NaN Python
3 NaN 35 NaN OS
4 NaN 38 NaN OS
5 NaN 43 NaN Java
6 NaN 32 NaN DS
7 NaN NaN 49 DS
8 NaN 25 NaN DS
9 NaN 34 NaN DS
Expected output is (Dataframe):
Subject Test1 Test2 Test3
Python 45,50 30
OS 35,38
Java 43
DS 32,25,34 49
I've tried this code:
df.groupby(['subject']).sum().reset_index().assign(subject =lambda x: x['subject'].where(~x['subject'].duplicated(), '')).to_csv('filename.csv', index=False)
It's not giving desired output.
Upvotes: 1
Views: 44
Reputation: 863651
Use custom function with remove missing values by Series.dropna
, if necessary convert to integer and then if some numeric values convert to string
s and use join
:
f = lambda x: ','.join(x.dropna().astype(int).astype(str))
df = df.groupby('Subject', sort=False).agg(f).reset_index()
print (df)
Subject Test1 Test2 Test3
0 Python 45,50 30
1 OS 35,38
2 Java 43
3 DS 32,25,34 49
Another idea without convert to integers working if many different formats of values (e.g. some columns are numeric and some strings):
f = lambda x: ','.join(x.dropna().astype(str))
df = df.groupby('Subject', sort=False).agg(f).reset_index()
print (df)
Subject Test1 Test2 Test3
0 Python 45.0,50.0 30.0
1 OS 35.0,38.0
2 Java 43.0
3 DS 32.0,25.0,34.0 49.0
Upvotes: 2