Reputation: 315
Lets say a have a data frame that looks like this:
V290 V311
0 GOOD TOP QUARTER
1 NK-UNASCERTAIN MIDDLE HALF
2 AVERAGE TOP QUARTER
3 POOR NK-UNASCERTAIN
4 POOR MIDDLE HALF
5 GOOD MIDDLE HALF
6 POOR TOP QUARTER
7 AVERAGE MIDDLE HALF
8 POOR MIDDLE HALF
9 AVERAGE MIDDLE HALF
10 POOR MIDDLE HALF
11 POOR MIDDLE HALF
12 AVERAGE MIDDLE HALF
13 AVERAGE TOP QUARTER
And i want to group this data by ['V311'] and see how many GOOD or POOR would be in Every ['V311'] Subcategory. I want to make something like this:
Top Quarter:GOOD:12
POOR:30
Average:15
Middle half:GOOD:5
POOR:19
Average:3
And so on...
Upvotes: 1
Views: 571
Reputation: 30605
You can do pivot with unstack i.e
df.pivot_table(index='V290',columns='V311',aggfunc='size',fill_value=0).unstack()
V311 V290
MIDDLE HALF AVERAGE 3
GOOD 1
NK-UNASCERTAIN 1
POOR 4
NK-UNASCERTAIN AVERAGE 0
GOOD 0
NK-UNASCERTAIN 0
POOR 1
TOP QUARTER AVERAGE 2
GOOD 1
NK-UNASCERTAIN 0
POOR 1
dtype: int64
Also :
df.groupby(['V290','V311']).size().unstack().fillna(0).unstack()
If you want the percentage then you can divide by the sum i.e
ndf = df.pivot_table(index='V290',columns='V311',aggfunc='size',fill_value=0)
percents = (ndf/ndf.sum()*100).unstack()
V311 V290
MIDDLE HALF AVERAGE 33.333333
GOOD 11.111111
NK-UNASCERTAIN 11.111111
POOR 44.444444
NK-UNASCERTAIN AVERAGE 0.000000
GOOD 0.000000
NK-UNASCERTAIN 0.000000
POOR 100.000000
TOP QUARTER AVERAGE 33.333333
GOOD 33.333333
NK-UNASCERTAIN 0.000000
POOR 33.333333
dtype: float64
Upvotes: 3
Reputation: 863166
Use dict comprehenion
with groupby
, value_counts
and convert to dict
:
d = {k:v.value_counts().to_dict() for k,v in df.groupby('V311')['V290']}
print (d)
{'NK-UNASCERTAIN': {'POOR': 1},
'MIDDLE HALF': {'POOR': 4, 'NK-UNASCERTAIN': 1, 'AVERAGE': 3, 'GOOD': 1},
'TOP QUARTER': {'POOR': 1, 'AVERAGE': 2, 'GOOD': 1}}
For output as Series
:
s = df.groupby('V311')['V290'].value_counts()
print (s)
V311 V290
MIDDLE HALF POOR 4
AVERAGE 3
GOOD 1
NK-UNASCERTAIN 1
NK-UNASCERTAIN POOR 1
TOP QUARTER AVERAGE 2
GOOD 1
POOR 1
Name: V290, dtype: int64
EDIT: If need relative frequencies:
s = df.groupby('V311')['V290'].value_counts(normalize=True)
print (s)
V311 V290
MIDDLE HALF POOR 0.444444
AVERAGE 0.333333
GOOD 0.111111
NK-UNASCERTAIN 0.111111
NK-UNASCERTAIN POOR 1.000000
TOP QUARTER AVERAGE 0.500000
GOOD 0.250000
POOR 0.250000
Name: V290, dtype: float64
EDIT1:
If want all missing categories too:
s = df.groupby('V311')['V290'].value_counts()
s = s.reindex(pd.MultiIndex.from_product(s.index.levels), fill_value=0)
print (s)
MIDDLE HALF AVERAGE 3
GOOD 1
NK-UNASCERTAIN 1
POOR 4
NK-UNASCERTAIN AVERAGE 0
GOOD 0
NK-UNASCERTAIN 0
POOR 1
TOP QUARTER AVERAGE 2
GOOD 1
NK-UNASCERTAIN 0
POOR 1
Name: V290, dtype: int64
Upvotes: 3
Reputation: 29
Using only pandas:
import pandas as pd
dataframe = pd.DataFrame()
dataframe['V311'] = ['MIDDLE','TOP','MIDDLE','TOP','MIDDLE','TOP','TOP']
print(dataframe['V311'].value_counts())
output:
TOP 4
MIDDLE 3
Name: V311, dtype: int64
Upvotes: 1