Reputation: 255
I have many dataframes read from pdf files. And they look like this:
e.g order 1 - this is a dataframe:
code description price quantity
000001 product A 1 10
000002 product B 2 20
000003 product C 3 30
...
order 2 - this is a dataframe:
code description price quantity
000001 product A 1 100
000002 product B 2 20
000004 product D 4 40
There will be orders 3, 4 etc I like to join them up and group by the code(which is unique) but display separately the quantities.
code description price order1 quantity order2 quantity
000001 product A 1 10 100
000002 product B 2 20 20
000003 product C 3 30 0
000004 product D 4 0 40
Apart from using tedious loops, I wonder if there is a cleaner way to achieve this in pandas. I used to process this in excel vba previously using ugly loops and is my first time trying it in pandas.
Thanks alot for any help!
Upvotes: 1
Views: 70
Reputation: 863256
You can create MultiIndex Series
by DataFrame.set_index
or aggregate sum
in list comprehension and then join together by concat
:
dfs = [df1, df2, ..., dfn]
#if triples code, description, price are unique
L = [x.set_index(['code', 'description', 'price'])['quantity'] for x in dfs]
#if triples are not unique and necessary aggregate sum
L = [x.groupby(['code', 'description', 'price'])['quantity'].sum() for x in dfs]
df1 = (pd.concat(L, axis=1, keys=range(1, len(dfs) + 1))
.add_prefix('quantity')
.fillna(0)
.astype(int)
.reset_index())
print (df1)
code description price quantity1 quantity2
0 000001 product A 1 10 100
1 000002 product B 2 20 20
2 000003 product C 3 30 0
3 000004 product D 4 0 40
EDIT:
For set default column names for each DataFrame in list comprehension is possible by DataFrame.set_axis
:
L = [x.set_axis(np.arange(len(x.columns)), axis=1, inplace=False).set_index([0,1,2])[3]
for x in dfs]
Upvotes: 1