ragethewolf
ragethewolf

Reputation: 158

Groupby using loops in python

I have ten dataframes with same column names 'Name' and 'data' respectively.

Using groupby and aggregation for all the dataset I am able to get the desired output but it is a lot of effort for ten dataset's and the margin of error increases because I need to maintain these dataset's separate. examples and codes provided below.

Df1:

Name data
Foo  Product
Foo  Misc
Bar  Product
Bar  Item

Df2:

Name data
Foo  Misc
Foo  Product
Bar  Product
Bar  Item

Desired output:
Df1:

Name data
Foo  Product,Misc
Bar  Product,Item

Df2:

Name data
Foo  Misc, Product
Bar  Product,Item

Currently I am using the below code to achieve this task

Group1= Df1.groupby('Name')['data'].agg(['data',','.join)]).reset_index()

Group2 = Df2.groupby('Name')['data'].agg(['data',','.join)]).reset_index()

Have tried the below but did not work

Group = [Df1,Df2]

for df in Group:
     df.groupby('Name')['data'].agg(['data',','.join)]).reset_index()

Also based on some suggestions tried the below
Group = [Df1,Df2]

for df in Group:
   df =  df.groupby('Name')['data'].agg(['data',','.join)]).reset_index()

Both did not produce any result no error on code but it's giving me the file without any changes.


Upvotes: 1

Views: 176

Answers (3)

trigonom
trigonom

Reputation: 528

a = [df1,df2]
for df in a:
    tmp = df.groupby(['Name'])['data'].apply(','.join).reset_index()
    df = df.append(tmp,ignore_index = True)

this will not change df1 and df2, but a[0] and a[1] will be updated, so if you don't mind accessing threw the list, you have there the updated tables

Upvotes: 0

Pygirl
Pygirl

Reputation: 13349

Df1 = pd.DataFrame({'Name':['Foo','Foo','Bar','Bar'],
                   'Data':['Product','Misc', 'Product', 'Item'],
                   })

Df2 = pd.DataFrame({'Name':['Foo','Foo','Bar','Bar'],
                   'Data':['Misc', 'Product', 'Product', 'Item'],
                   })

Solution

fields=[f'Df{i}' for i in range(1,3)]
dfsout=[Df1, Df2]
variables = locals()
for d,name in zip(dfsout,fields):
    variables["{0}".format(name)]=pd.DataFrame(d.groupby('Name')['Data'].apply(list)).reset_index(level=0)

Df1:

    Name    Data
0   Foo Product
1   Foo Misc
2   Bar Product
3   Bar Item

Df2:

Name    Data
0   Foo Misc
1   Foo Product
2   Bar Product
3   Bar Item

After Implementing the solution part:

Df1:

     Data
Name    
Bar [Product, Item]
Foo [Product, Misc]

Df2:

     Data
Name    
Bar [Product, Item]
Foo [Misc, Product]

Upvotes: 1

Scott Boston
Scott Boston

Reputation: 153460

My suggestion is to use a dictionary.

dd = {'Df1':Df1,
      'Df2':Df2}


for k, v in dd.items():
    dd[k] = v.groupby('Name').agg(list)

dd

Output:

{'Df1':                  Data
 Name                 
 Bar   [Product, Item]
 Foo   [Product, Misc], 
 'Df2':                  Data
 Name                 
 Bar   [Product, Item]
 Foo   [Misc, Product]}

Upvotes: 2

Related Questions