J T
J T

Reputation: 255

Python pandas Grouping in a different way

I have many dataframes read from pdf files. And they look like this:

e.g order 1 - this is a dataframe:
code    description   price   quantity       
000001  product A       1       10
000002  product B       2       20
000003  product C       3       30 
...

order 2 - this is a dataframe:
code    description   price   quantity
000001  product A       1       100
000002  product B       2       20
000004  product D       4       40

There will be orders 3, 4 etc I like to join them up and group by the code(which is unique) but display separately the quantities.

code    description   price   order1 quantity  order2 quantity
000001  product A       1       10              100
000002  product B       2       20              20
000003  product C       3       30               0 
000004  product D       4        0              40

Apart from using tedious loops, I wonder if there is a cleaner way to achieve this in pandas. I used to process this in excel vba previously using ugly loops and is my first time trying it in pandas.

Thanks alot for any help!

Upvotes: 1

Views: 70

Answers (1)

jezrael
jezrael

Reputation: 863256

You can create MultiIndex Series by DataFrame.set_index or aggregate sum in list comprehension and then join together by concat:

dfs = [df1, df2, ..., dfn]

#if triples code, description, price are unique
L = [x.set_index(['code', 'description', 'price'])['quantity'] for x in dfs]

#if triples are not unique and necessary aggregate sum
L = [x.groupby(['code', 'description', 'price'])['quantity'].sum() for x in dfs]

df1 = (pd.concat(L, axis=1, keys=range(1, len(dfs) + 1))
         .add_prefix('quantity')
         .fillna(0)
         .astype(int)
         .reset_index())
print (df1)
     code description  price  quantity1  quantity2
0  000001   product A      1         10        100
1  000002   product B      2         20         20
2  000003   product C      3         30          0
3  000004   product D      4          0         40

EDIT:

For set default column names for each DataFrame in list comprehension is possible by DataFrame.set_axis:

L = [x.set_axis(np.arange(len(x.columns)), axis=1, inplace=False).set_index([0,1,2])[3] 
                                             for x in dfs]

Upvotes: 1

Related Questions