codewarrior
codewarrior

Reputation: 101

grouping in pandas dataframe

I have this input dataframe:


                Test1               Test2               Test3           Subject
0                 45                 NaN                NaN              Python
1                 50                 NaN                NaN              Python
2                NaN                 30                 NaN              Python
3                NaN                 35                 NaN                OS   
4                NaN                 38                 NaN                OS
5                NaN                 43                 NaN               Java
6                NaN                 32                 NaN                 DS
7                NaN                 NaN                49                  DS
8                NaN                 25                 NaN                 DS 
9                NaN                 34                 NaN                 DS  

Expected output is (Dataframe):

Subject         Test1           Test2           Test3
Python          45,50             30    
OS                              35,38               
Java                            43
DS                              32,25,34         49

I've tried this code:

df.groupby(['subject']).sum().reset_index().assign(subject =lambda x: x['subject'].where(~x['subject'].duplicated(), '')).to_csv('filename.csv', index=False)

It's not giving desired output.

Upvotes: 1

Views: 44

Answers (1)

jezrael
jezrael

Reputation: 863651

Use custom function with remove missing values by Series.dropna, if necessary convert to integer and then if some numeric values convert to strings and use join:

f = lambda x: ','.join(x.dropna().astype(int).astype(str))
df = df.groupby('Subject', sort=False).agg(f).reset_index()
print (df)
  Subject  Test1     Test2 Test3
0  Python  45,50        30      
1      OS            35,38      
2    Java               43      
3      DS         32,25,34    49

Another idea without convert to integers working if many different formats of values (e.g. some columns are numeric and some strings):

f = lambda x: ','.join(x.dropna().astype(str))
df = df.groupby('Subject', sort=False).agg(f).reset_index()
print (df)
  Subject      Test1           Test2 Test3
0  Python  45.0,50.0            30.0      
1      OS                  35.0,38.0      
2    Java                       43.0      
3      DS             32.0,25.0,34.0  49.0

Upvotes: 2

Related Questions