user2205916
user2205916

Reputation: 3456

Python: build object of Pandas dataframes

I have a dataframe that has dtype=object, i.e. categorical variables, for which I'd like to have the counts of each level of. I'd like the result to be a pretty summary of all categorical variables.

To achieve the aforementioned goals, I tried the following:

(line 1) grab the names of all object-type variables

(line 2) count the number of observations for each level (a, b of v1)

(line 3) rename the column so it reads "count"

stringCol = list(df.select_dtypes(include=['object'])) # list object of categorical variables
a = df.groupby(stringCol[0]).agg({stringCol[0]: 'count'})
a = a.rename(index=str, columns={stringCol[0]: 'count'}); a
    count
v1  
a   1279
b   2382

I'm not sure how to elegantly get the following result where all string column counts are printed. Like so (only v1 and v4 shown, but should be able to print such results for a variable number of columns):

    count       count
v1           v4
a   1279     l  32
b   2382     u  3055
             y  549

The way I can think of doing it is:

  1. select one element of stringCol
  2. calculate the count of for each group of the column.
  3. store the result in a Pandas dataframe.
  4. store the Pandas dataframe in an object (list?)
  5. repeat
  6. if last element of stringCol is done, break.

but there must be a better way than that, just not sure how to do it.

Upvotes: 1

Views: 71

Answers (1)

jezrael
jezrael

Reputation: 862511

I think simpliest is use loop:

df = pd.DataFrame({'A':list('abaaee'),
                   'B':list('abbccf'),
                   'C':[7,8,9,4,2,3],
                   'D':[1,3,5,7,1,0],
                   'E':[5,3,6,9,2,4],
                   'F':list('aacbbb')})

print (df)
   A  B  C  D  E  F
0  a  a  7  1  5  a
1  b  b  8  3  3  a
2  a  b  9  5  6  c
3  a  c  4  7  9  b
4  e  c  2  1  2  b
5  e  f  3  0  4  b

stringCol = list(df.select_dtypes(include=['object']))

for c in stringCol:
    a = df[c].value_counts().rename_axis(c).to_frame('count')
    #alternative
    #a = df.groupby(c)[c].count().to_frame('count')
    print (a)

   count
A       
a      3
e      2
b      1
   count
B       
b      2
c      2
a      1
f      1
   count
F       
b      3
a      2
c      1

For list of DataFrames use list comprehension:

dfs = [df[c].value_counts().rename_axis(c).to_frame('count') for c in stringCol]
print (dfs)

[   count
A       
a      3
e      2
b      1,    count
B       
b      2
c      2
a      1
f      1,    count
F       
b      3
a      2
c      1]

Upvotes: 1

Related Questions