Reputation: 3586
I am trying to summarize some data by their categories in different columns in a dataframe. This is the data.
feature1 feature2 featurem
brand1 good none good
brand2 bad good bad
brand.. none none good
brandn good none none
I would like to have a table that shows me how many from [good,bad,none] I have per feature. So that would tell me how many brands have good in feature 1, bad in feature 1 and none in feature 1 and so on.
I know, for instance, that with
df["feature1"].value_counts()
I can get such values for each feature separately, but I would like to add it to a new dataframe where I have all the features. How can I do this?
A resulting table would look like this:
Upvotes: 1
Views: 135
Reputation: 153460
Try this, using apply
and passing pd.Series.value_counts
:
df = pd.DataFrame({'feature '+str(i):np.random.choice(['Good','Bad','none'], 20) for i in range(1,10)})
df.apply(pd.Series.value_counts)
Output:
feature 1 feature 2 feature 3 feature 4 feature 5 feature 6 \
Bad 6 12 6 10 6 4
Good 6 2 8 5 6 9
none 8 6 6 5 8 7
feature 7 feature 8 feature 9
Bad 3 6 7
Good 3 6 4
none 14 8 9
Upvotes: 5
Reputation: 51155
This is melt
+ crosstab
m = df.melt()
pd.crosstab(m['value'], m['variable'])
variable feature 1 feature 2 feature m
value
bad 1 0 1
good 2 1 2
none 1 3 1
Upvotes: 4