Compare 2 dataframes in pandas based on column value

Question

I have 2 data frames. The first one "named x" has the following format:

             ISBN  Age Category  Count
20     000649840X  Middle Adult      6
21     000649840X   Young Adult     16

Where The same ISBN have multiple rows with different categories.

The other dataframe "named y" has the one row for each ISBN value. I want to create two column in the last dataframe one for each category count where the final result should look like:

ISBN      Middle Adult Count       Young Adult Count

I tried

y["Young Adult Count"] = x[(x['Age Category'] == 'Young Adult') & (y['ISBN] == x['ISBN])]['Count']

But it did not work

jezrael · Accepted Answer

I think you need pivot_table with join (default left join, so if some ISBN missing get NaNs rows) another DataFrame:

print (df1)
         ISBN  Age Category  Count
0  000649840X  Middle Adult      6
1  000649840X   Young Adult     16


print (df2)
         ISBN
0  000649840X
1  0001111111

df3 = df1.pivot_table(index='ISBN', 
                      columns='Age Category', 
                      values='Count', 
                      aggfunc='sum')
df = df2.join(df3.add_suffix(' Count'), on='ISBN')
print (df)
         ISBN  Middle Adult Count  Young Adult Count
0  000649840X                 6.0               16.0
1  0001111111                 NaN                NaN

For inner join use parameter how='inner':

df = df2.join(df3.add_suffix(' Count'), on='ISBN', how='inner')
print (df)
         ISBN  Middle Adult Count  Young Adult Count
0  000649840X                   6                 16

Compare 2 dataframes in pandas based on column value

Answers (2)

Related Questions