Reputation: 385
I have 2 data frames.
The first one "named x"
has the following format:
ISBN Age Category Count
20 000649840X Middle Adult 6
21 000649840X Young Adult 16
Where The same ISBN have multiple rows with different categories.
The other dataframe "named y"
has the one row for each ISBN value. I want to create two column in the last dataframe one for each category count where the final result should look like:
ISBN Middle Adult Count Young Adult Count
I tried
y["Young Adult Count"] = x[(x['Age Category'] == 'Young Adult') & (y['ISBN] == x['ISBN])]['Count']
But it did not work
Upvotes: 2
Views: 204
Reputation: 862481
I think you need pivot_table
with join
(default left join, so if some ISBN
missing get NaN
s rows) another DataFrame
:
print (df1)
ISBN Age Category Count
0 000649840X Middle Adult 6
1 000649840X Young Adult 16
print (df2)
ISBN
0 000649840X
1 0001111111
df3 = df1.pivot_table(index='ISBN',
columns='Age Category',
values='Count',
aggfunc='sum')
df = df2.join(df3.add_suffix(' Count'), on='ISBN')
print (df)
ISBN Middle Adult Count Young Adult Count
0 000649840X 6.0 16.0
1 0001111111 NaN NaN
For inner join use parameter how='inner'
:
df = df2.join(df3.add_suffix(' Count'), on='ISBN', how='inner')
print (df)
ISBN Middle Adult Count Young Adult Count
0 000649840X 6 16
Upvotes: 2
Reputation: 402283
groupby
+ sum
+ unstack
should work.
df.groupby(['ISBN', 'Age Category'])\
.Count.sum()\
.unstack()\
.add_suffix(' Count')
Age Category Middle Adult Count Young Adult Count
ISBN
000649840X 6 16
If you want to explicitly filter on those two categories, then filter first, with isin
-
cat = ['Middle Adult', 'Young Adult']
df = df[df['Age Category'].isin(cat)]
Now, perform the groupby
.
Upvotes: 2