lesolorzanov
lesolorzanov

Reputation: 3586

pandas summarize by count_values as rows

I am trying to summarize some data by their categories in different columns in a dataframe. This is the data.

         feature1  feature2  featurem
brand1   good      none      good
brand2   bad       good      bad
brand..  none      none      good
brandn   good      none      none

I would like to have a table that shows me how many from [good,bad,none] I have per feature. So that would tell me how many brands have good in feature 1, bad in feature 1 and none in feature 1 and so on.

I know, for instance, that with

df["feature1"].value_counts() 

I can get such values for each feature separately, but I would like to add it to a new dataframe where I have all the features. How can I do this?

A resulting table would look like this:

enter image description here

Upvotes: 1

Views: 135

Answers (2)

Scott Boston
Scott Boston

Reputation: 153460

Try this, using apply and passing pd.Series.value_counts:

df = pd.DataFrame({'feature '+str(i):np.random.choice(['Good','Bad','none'], 20) for i in range(1,10)})

df.apply(pd.Series.value_counts)

Output:

      feature 1  feature 2  feature 3  feature 4  feature 5  feature 6  \
Bad           6         12          6         10          6          4   
Good          6          2          8          5          6          9   
none          8          6          6          5          8          7   

      feature 7  feature 8  feature 9  
Bad           3          6          7  
Good          3          6          4  
none         14          8          9  

Upvotes: 5

user3483203
user3483203

Reputation: 51155

This is melt + crosstab

m = df.melt()
pd.crosstab(m['value'], m['variable'])

variable  feature 1  feature 2  feature m
value
bad               1          0          1
good              2          1          2
none              1          3          1

Upvotes: 4

Related Questions