Omar Amr
Omar Amr

Reputation: 385

Compare 2 dataframes in pandas based on column value

I have 2 data frames. The first one "named x" has the following format:

             ISBN  Age Category  Count
20     000649840X  Middle Adult      6
21     000649840X   Young Adult     16

Where The same ISBN have multiple rows with different categories.

The other dataframe "named y" has the one row for each ISBN value. I want to create two column in the last dataframe one for each category count where the final result should look like:

ISBN      Middle Adult Count       Young Adult Count

I tried

y["Young Adult Count"] = x[(x['Age Category'] == 'Young Adult') & (y['ISBN] == x['ISBN])]['Count']

But it did not work

Upvotes: 2

Views: 204

Answers (2)

jezrael
jezrael

Reputation: 862481

I think you need pivot_table with join (default left join, so if some ISBN missing get NaNs rows) another DataFrame:

print (df1)
         ISBN  Age Category  Count
0  000649840X  Middle Adult      6
1  000649840X   Young Adult     16


print (df2)
         ISBN
0  000649840X
1  0001111111

df3 = df1.pivot_table(index='ISBN', 
                      columns='Age Category', 
                      values='Count', 
                      aggfunc='sum')
df = df2.join(df3.add_suffix(' Count'), on='ISBN')
print (df)
         ISBN  Middle Adult Count  Young Adult Count
0  000649840X                 6.0               16.0
1  0001111111                 NaN                NaN

For inner join use parameter how='inner':

df = df2.join(df3.add_suffix(' Count'), on='ISBN', how='inner')
print (df)
         ISBN  Middle Adult Count  Young Adult Count
0  000649840X                   6                 16

Upvotes: 2

cs95
cs95

Reputation: 402283

groupby + sum + unstack should work.

df.groupby(['ISBN', 'Age Category'])\
  .Count.sum()\
  .unstack()\
  .add_suffix(' Count')

Age Category  Middle Adult Count  Young Adult Count
ISBN
000649840X                     6                 16

If you want to explicitly filter on those two categories, then filter first, with isin -

cat = ['Middle Adult', 'Young Adult']
df = df[df['Age Category'].isin(cat)]

Now, perform the groupby.

Upvotes: 2

Related Questions