Reputation: 3456
I have a dataframe that has dtype=object
, i.e. categorical variables, for which I'd like to have the counts of each level of. I'd like the result to be a pretty summary of all categorical variables.
To achieve the aforementioned goals, I tried the following:
(line 1) grab the names of all object-type variables
(line 2) count the number of observations for each level (a
, b
of v1
)
(line 3) rename the column so it reads "count"
stringCol = list(df.select_dtypes(include=['object'])) # list object of categorical variables
a = df.groupby(stringCol[0]).agg({stringCol[0]: 'count'})
a = a.rename(index=str, columns={stringCol[0]: 'count'}); a
count
v1
a 1279
b 2382
I'm not sure how to elegantly get the following result where all string column counts are printed. Like so (only v1
and v4
shown, but should be able to print such results for a variable number of columns):
count count
v1 v4
a 1279 l 32
b 2382 u 3055
y 549
The way I can think of doing it is:
stringCol
stringCol
is done, break.but there must be a better way than that, just not sure how to do it.
Upvotes: 1
Views: 71
Reputation: 862511
I think simpliest is use loop:
df = pd.DataFrame({'A':list('abaaee'),
'B':list('abbccf'),
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aacbbb')})
print (df)
A B C D E F
0 a a 7 1 5 a
1 b b 8 3 3 a
2 a b 9 5 6 c
3 a c 4 7 9 b
4 e c 2 1 2 b
5 e f 3 0 4 b
stringCol = list(df.select_dtypes(include=['object']))
for c in stringCol:
a = df[c].value_counts().rename_axis(c).to_frame('count')
#alternative
#a = df.groupby(c)[c].count().to_frame('count')
print (a)
count
A
a 3
e 2
b 1
count
B
b 2
c 2
a 1
f 1
count
F
b 3
a 2
c 1
For list of DataFrames
use list comprehension
:
dfs = [df[c].value_counts().rename_axis(c).to_frame('count') for c in stringCol]
print (dfs)
[ count
A
a 3
e 2
b 1, count
B
b 2
c 2
a 1
f 1, count
F
b 3
a 2
c 1]
Upvotes: 1