Reputation: 379
I have two dataframes where i am trying to compare two columns (Cat1 and Cat2) and where Cat1 and Cat2 are the same i want to sum the values in the Prc column
So, in the example below, the only two rows that meet the criteria is row 0 and row 4 of df[0] which meets row 1 and row 4 of df[1] and therefore in this case the sum would be 200 for df[0] and 185 for df[1]
df[0]
Cat1 Cat2 Cat3 Prc
0 11 0 5 100
1 22 2 9 150
2 33 1 8 50
3 44 2 6 200
4 55 1 8 100
df[1]
Cat1 Cat2 Cat3 Prc
0 66 1 6 120
1 11 0 5 90
2 44 1 6 185
3 77 2 7 145
4 55 1 5 95
i am able to compare Cat1 in df[0][ vs df[1] using .isin but if that is all i did then i would pick up row 3 in df[0] even though Cat2 is different in df[0] and d[1]
how do i seek to compare two columns in different dataframes at the same time?
these are large dataframes of 500,000 rows x 32 columns each, so i want to avoid creating new dataframes or new columns.
Upvotes: 1
Views: 67
Reputation: 863411
One idea is use DataFrame.merge
for intersection of multiple columns, filter column with Prc
and sum
:
df1 = df[0].merge(df[1], on=['Cat1','Cat2'], suffixes=('_0','_1'))
print (df1)
Cat1 Cat2 Cat3_0 Prc_0 Cat3_1 Prc_1
0 11 0 5 100 5 90
1 55 1 8 100 5 95
print (df1.filter(like='Prc').sum())
Prc_0 200
Prc_1 185
dtype: int64
Another idea with MultiIndex
by columns for intersection with DataFrame.set_index
and Index.isin
and filtering by boolean indexing
:
s1 = df[0].set_index(['Cat1','Cat2'])['Prc']
s2 = df[1].set_index(['Cat1','Cat2'])['Prc']
print (s1[s1.index.isin(s2.index)].sum())
200
print (s2[s2.index.isin(s1.index)].sum())
185
Upvotes: 2