Reputation: 158
I have ten dataframes with same column names 'Name' and 'data' respectively.
Using groupby and aggregation for all the dataset I am able to get the desired output but it is a lot of effort for ten dataset's and the margin of error increases because I need to maintain these dataset's separate. examples and codes provided below.
Df1:
Name data
Foo Product
Foo Misc
Bar Product
Bar Item
Df2:
Name data
Foo Misc
Foo Product
Bar Product
Bar Item
Desired output:
Df1:
Name data
Foo Product,Misc
Bar Product,Item
Df2:
Name data
Foo Misc, Product
Bar Product,Item
Currently I am using the below code to achieve this task
Group1= Df1.groupby('Name')['data'].agg(['data',','.join)]).reset_index()
Group2 = Df2.groupby('Name')['data'].agg(['data',','.join)]).reset_index()
Have tried the below but did not work
Group = [Df1,Df2]
for df in Group:
df.groupby('Name')['data'].agg(['data',','.join)]).reset_index()
Also based on some suggestions tried the below
Group = [Df1,Df2]
for df in Group:
df = df.groupby('Name')['data'].agg(['data',','.join)]).reset_index()
Both did not produce any result no error on code but it's giving me the file without any changes.
Upvotes: 1
Views: 176
Reputation: 528
a = [df1,df2]
for df in a:
tmp = df.groupby(['Name'])['data'].apply(','.join).reset_index()
df = df.append(tmp,ignore_index = True)
this will not change df1 and df2, but a[0] and a[1] will be updated, so if you don't mind accessing threw the list, you have there the updated tables
Upvotes: 0
Reputation: 13349
Df1 = pd.DataFrame({'Name':['Foo','Foo','Bar','Bar'],
'Data':['Product','Misc', 'Product', 'Item'],
})
Df2 = pd.DataFrame({'Name':['Foo','Foo','Bar','Bar'],
'Data':['Misc', 'Product', 'Product', 'Item'],
})
Solution
fields=[f'Df{i}' for i in range(1,3)]
dfsout=[Df1, Df2]
variables = locals()
for d,name in zip(dfsout,fields):
variables["{0}".format(name)]=pd.DataFrame(d.groupby('Name')['Data'].apply(list)).reset_index(level=0)
Df1:
Name Data
0 Foo Product
1 Foo Misc
2 Bar Product
3 Bar Item
Df2:
Name Data
0 Foo Misc
1 Foo Product
2 Bar Product
3 Bar Item
After Implementing the solution part:
Df1:
Data
Name
Bar [Product, Item]
Foo [Product, Misc]
Df2:
Data
Name
Bar [Product, Item]
Foo [Misc, Product]
Upvotes: 1
Reputation: 153460
My suggestion is to use a dictionary.
dd = {'Df1':Df1,
'Df2':Df2}
for k, v in dd.items():
dd[k] = v.groupby('Name').agg(list)
dd
Output:
{'Df1': Data
Name
Bar [Product, Item]
Foo [Product, Misc],
'Df2': Data
Name
Bar [Product, Item]
Foo [Misc, Product]}
Upvotes: 2