Reputation: 3526
I am a newbie in pandas. I am trying the same thing as this answer is saying.
I have data like
SubjectID Visit Value1 Value2 Value3
1 1.57 1.75 1.56
1 1.56 N/A N/A
B1 2 N/A 1.56 N/A
and I want data like
SubjectID Visit Value1 Value2 Value3
1 1.57,1.56 1.75 1.56
B1 2 N/A 1.56 N/A
What I tried is
#1. Did not work
df.groupby(['SubjectID', 'Visit'])['Value1']\
.apply(lambda val: ','.join(val)).reset_index()
#2. did not work
df = df.groupby(['SubjectID', 'Visit'])['Value1']\
.apply(lambda val: ','.join(val)).reset_index()
#3. did not work
df.groupby(['SubjectID', 'Visit'])['Value1', 'Value2', 'Value3']\
.apply(lambda v1, v2, v3 : ','.join(val1), ','.join(val2), ','.join(val3)).reset_index()
#4. did not work
df.groupby(['SubjectID', 'Visit'])\
['Value1'].apply(lambda v1:','.join(val1)) \
['Value2'].apply(lambda v2:','.join(val2)) \
['Value3'].apply(lambda v3:','.join(val3)).reset_index()
Any suggestions to get this working?
Upvotes: 2
Views: 147
Reputation: 19957
Use groupby and join the values from same group:
(
df.fillna('')
.astype(str).groupby(by=['SubjectID', 'Visit'])
.agg(lambda x: ','.join(x.loc[x!='']))
.reset_index()
)
SubjectID Visit Value1 Value2 Value3
0 1 1.57,1.56 1.75 1.56
1 B1 1 1.56
Upvotes: 1
Reputation: 863166
Use GroupBy.agg
with custom lambda function with remove missing values by Series.dropna
and cast to strings and if all values are missing is returned np.nan
:
f = lambda val: np.nan if val.isna().all() else ','.join(val.dropna().astype(str))
df = df.groupby(['SubjectID', 'Visit']).agg(f).reset_index()
print (df)
SubjectID Visit Value1 Value2 Value3
0 1 1.57,1.56 1.75 1.56
1 B1 2 NaN 1.56 NaN
If need empty string instead missing values, solution is simplier:
f = lambda val: ','.join(val.dropna().astype(str))
df1 = df.groupby(['SubjectID', 'Visit']).agg(f).reset_index()
print (df1)
SubjectID Visit Value1 Value2 Value3
0 1 1.57,1.56 1.75 1.56
1 B1 2 1.56
Upvotes: 3