Reputation: 5299
df1
A B
a 1
a 1
a 4
b 1
b 3
df2
A B
a 1
a 2
c 3
c 5
df1.groupby("A").size()
a 3
b 2
df2.groupby("A").size()
a 2
c 2
I'd like to get following size
aggregation
df1 df2
a 3 2
b 2 0
c 0 2
Are there any way to achieve this? I'd like to know aggregation method.
If someone has opinion,please let me know. Thanks
Upvotes: 2
Views: 131
Reputation: 863166
In concat
is possible use parameter keys
for new column names, because working with Series
, then replace missing values and convert to integers:
df3 = (pd.concat([df1.groupby("A").size(),
df2.groupby("A").size()], axis=1, keys=['df1','df2'])
.fillna(0)
.astype(int))
print (df3)
df1 df2
a 3 2
b 2 0
c 0 2
Another idea is first use concat
, convert MultiIndex
to columns by DataFrame.reset_index
, so possible use DataFrame.pivot_table
, last use DataFrame.rename_axis
:
df3 = (pd.concat([df1, df2], keys=['df1','df2'])
.reset_index()
.pivot_table(index='A', columns='level_0', aggfunc='size', fill_value=0)
.rename_axis(columns=None, index=None)
)
print (df3)
df1 df2
a 3 2
b 2 0
c 0 2
Upvotes: 1
Reputation: 28709
Just a variant of @David's answer :
frames = {"df1": df1, "df2": df2}
pd.concat(
{key: value.groupby("A").size() for key, value in frames.items()}, axis=1
).fillna(0)
Upvotes: 0
Reputation: 16683
pd.concat
on the two grouped dataframes and pass axis=1
(This is essentailly an outer
join with pd.merge
, but the syntax is a bit more concise)..fillna(0)
, rename columns as desired with .rename()
and use .astype(int)
to make the columns data types integers:df3 = (pd.concat([df1.groupby("A").size(), df2.groupby("A").size()], axis=1)
.fillna(0).rename({0 : 'df1', 1 : 'df2'}, axis=1).astype(int))
df3
Out[1]:
df1 df2
a 3 2
b 2 0
c 0 2
Upvotes: 3
Reputation: 26676
#groupby using agg to rename axis.
g=df1.groupby("A", as_index=False)['B'].agg({'df1':'size'})
h=df2.groupby("A", as_index=False)['B'].agg({'df2':'size'})
#concat the resultant datframe, fffiil, bfill and dro duplicates
(pd.concat([g,h], ignore_index=True).sort_values(by='A').ffill().bfill()).drop_duplicates()
A df1 df2
0 a 3.0 2.0
1 b 2.0 2.0
3 c 2.0 2.0
Upvotes: 0
Reputation: 3031
If you're working in pandas I would do an outer join.
data = [['a','a','a','b','b'],[1,1,4,1,3]]
df1 = pd.DataFrame(data).T
df1.columns = ['A','B']
data = [['a','a','c','c'],[1,2,3,5]]
df2 = pd.DataFrame(data).T
df2.columns = ['A','B']
temp1 = pd.DataFrame(df1.groupby("A").size())
temp2 = pd.DataFrame(df2.groupby("A").size())
df = temp1.merge(temp2,how='outer',left_index=True,right_index=True)
df.fillna(0)
output
0_x 0_y
A
a 3.0 2.0
b 2.0 0.0
c 0.0 2.0
Upvotes: 1
Reputation: 34086
You can do the below:
In [883]: df1 = df1.groupby("A").size().reset_index()
In [884]: df2 = df2.groupby("A").size().reset_index()
In [886]: df1.merge(df2, on='A', how='outer').fillna(0).rename(columns={'0_x': 'df1', '0_y':'df2'}).set_index('A')
Out[886]:
df1 df2
A
a 3.0 2.0
b 2.0 0.0
c 0.0 2.0
Upvotes: 0