Reputation: 1265
I have the following list of columns and Dataframe in pandas respectively
import pandas as pd
df_1=pd.DataFrame()
files_list=["Col1", "Col2", "Col3"]
df_1["ID"]=["S", "S", "T", "L", "L", "L"]
df_1["Col1"]=[["A"], ["A1"], ["B"], ["B2"], ["C2"], ["D1"]]
df_1["Col2"]=[["R"], ["R1"], ["S"], ["R2"], ["Q2"], ["T1"]]
df_1["Col3"]=[["y"], ["ii1"], [], ["m2"], ["i2"], ["p1"]]
The dataframe looks like this.
ID Col1 Col2 Col3
0 S [A] [R] [y]
1 S [A1] [R1] [ii1]
2 T [B] [S] []
3 L [B2] [R2] [m2]
4 L [C2] [Q2] [i2]
5 L [D1] [T1] [p1]
I need to group the columns Col1, Col2 and Col3 to result in the following dataframe
ID Col1 Col2 Col3
0 S [A , A1] [R, R1] [y, ii1]
2 T [B] [S] []
3 L [B2, C2, D1] [R2, Q2, T1] [m2, i2, p1]
I have tried the following code
files_list=["Col1", "Col2", "Col3"]
df_1 = df_1.groupby('ID').agg( {files_list})
I have also tried this
df_1=df_1.groupby(['ID'])(files_list).apply(','.join).reset_index()
Is there a way to accomplish this . I would like to know where I am making a mistake here
Upvotes: 1
Views: 718
Reputation: 863166
Use nested list comprehension in GroupBy.agg
with filtered columns names in list
:
files_list=["Col1", "Col2", "Col3"]
f = lambda x: [z for y in x for z in y]
df_1 = df_1.groupby('ID', sort=False, as_index=False)[files_list].agg(f)
If performance is not important or small DataFrame is possible use sum
for join lists:
files_list=["Col1", "Col2", "Col3"]
df_1 = df_1.groupby('ID', sort=False, as_index=False)[files_list].agg(sum)
print (df_1)
ID Col1 Col2 Col3
0 S [A, A1] [R, R1] [y, ii1]
1 T [B] [S] []
2 L [B2, C2, D1] [R2, Q2, T1] [m2, i2, p1]
Upvotes: 3