Reputation: 12172
I know that I can have percentage values in a pandas.crosstab()
when normalize=True
.
But I want to combine absolute and normalized values in one table.
What I expect is a snipped like this
YEAR 2000 2001 2002
foo n % n % n %
A 1 0.16 0 0.0 1 0.16
This is an MWE producing normalized and counted values in separte tables.
#!/usr/bin/env python3
import pandas as pd
data = {'YEAR': [2001, 2001, 2002, 2000, 2000, 2002],
'foo': ['C', 'B', 'A', 'A', 'B', 'C']}
df = pd.DataFrame(data)
df.YEAR = df.YEAR.astype('int16')
df.foo = df.foo.astype('category')
print(df)
taba = pd.crosstab(df.foo, df.YEAR, margins=True, dropna=False)
tabb = pd.crosstab(df.foo, df.YEAR, margins=True, dropna=False, normalize=True)
print(taba)
print(tabb)
Output of the MWE
YEAR foo
0 2001 C
1 2001 B
2 2002 A
3 2000 A
4 2000 B
5 2002 C
YEAR 2000 2001 2002 All
foo
A 1 0 1 2
B 1 1 0 2
C 0 1 1 2
All 2 2 2 6
YEAR 2000 2001 2002 All
foo
A 0.166667 0.000000 0.166667 0.333333
B 0.166667 0.166667 0.000000 0.333333
C 0.000000 0.166667 0.166667 0.333333
All 0.333333 0.333333 0.333333 1.000000
Upvotes: 2
Views: 2451
Reputation: 23217
You can join the 2 result dataframes and then rearrange the column index, as follows:
tab2 = taba.join(tabb, lsuffix='_n', rsuffix='_%')
tab2.columns = tab2.columns.map(lambda x: tuple(x.split('_')))
tab2 = (tab2.sort_index(ascending=[True, False] , axis=1)
.rename_axis(columns=['YEAR', 'count_pct'], axis=1)
)
Result:
YEAR 2000 2001 2002 All
count_pct n % n % n % n %
foo
A 1 0.166667 0 0.000000 1 0.166667 2 0.333333
B 1 0.166667 1 0.166667 0 0.000000 2 0.333333
C 0 0.000000 1 0.166667 1 0.166667 2 0.333333
All 2 0.333333 2 0.333333 2 0.333333 6 1.000000
Edit:
Breaking down the steps with more explanations on .sort_index()
and rename_axis()
:
The interim resulting layout of tab2
before the last step is as follows:
YEAR 2000 2001 2002 All 2000 2001 2002 All
YEAR n n n n % % % %
foo
A 1 0 1 2 0.166667 0.000000 0.166667 0.333333
B 1 1 0 2 0.166667 0.166667 0.000000 0.333333
C 0 1 1 2 0.000000 0.166667 0.166667 0.333333
All 2 2 2 6 0.333333 0.333333 0.333333 1.000000
Here, 2 more fine-tunings we need to do:
n
and %
is under the same year. We do it by sort_index()
here. axis=1
is to specify the index sorting is on columns instead of row index. The ascending=
parameter is to specify the sorting order of the 2 levels of the column MultiIndex. The first True
is to specify that the YEAR
index should be sorted in ascending order, while the second False
is to specify sorting 'n'
and '%'
in descending order. This is the required sorting for them to appear in the required sequence with 'n'
to show before '%'
.Result:
YEAR 2000 2001 2002 All
YEAR n % n % n % n %
foo
A 1 0.166667 0 0.000000 1 0.166667 2 0.333333
B 1 0.166667 1 0.166667 0 0.000000 2 0.333333
C 0 0.000000 1 0.166667 1 0.166667 2 0.333333
All 2 0.333333 2 0.333333 2 0.333333 6 1.000000
'YEAR'
to 'count_pct'
. As you can see, there are now 2 'YEAR'
on the left of the first and second line of the display. These corresponds to the axis names of first and second level column MultiIndex. We don't want all these axis names are the same. Hence, change by .rename_axis
to:YEAR 2000 2001 2002 All
count_pct n % n % n % n %
foo
A 1 0.166667 0 0.000000 1 0.166667 2 0.333333
B 1 0.166667 1 0.166667 0 0.000000 2 0.333333
C 0 0.000000 1 0.166667 1 0.166667 2 0.333333
All 2 0.333333 2 0.333333 2 0.333333 6 1.000000
The axis name of second (lower) level of column MultiIndex is changed to 'count_pct'
.
Upvotes: 2
Reputation: 93161
Since you already calculated the two cross tabs, the simplest solution is to concatenate them into your final data frame:
taba = pd.crosstab(df.foo, df.YEAR, dropna=False)
tabb = pd.crosstab(df.foo, df.YEAR, dropna=False, normalize=True)
tab = (
pd.concat([taba, tabb], axis=1, keys=['n', '%'])
.swaplevel(axis=1)
.sort_index(axis=1, ascending=[True, False])
.rename_axis(['YEAR', 'foo'], axis=1)
)
Resulting output is:
YEAR 2000 2001 2002
foo n % n % n %
foo
A 1 0.166667 0 0.000000 1 0.166667
B 1 0.166667 1 0.166667 0 0.000000
C 0 0.000000 1 0.166667 1 0.166667
Upvotes: 1