Reputation: 21898
I have the following DataFrame
:
KPI_01 KPI_02 KPI_03
date
2015-05-24 green green red
2015-06-24 orange red NaN
And I want to count the number of colors for each date in order to obtain:
value green orange red
date
2015-05-24 2 0 1
2015-06-24 0 1 1
Here is my code that does the job. Is there a better way (shorter) to do that ?
# Test data
df= pd.DataFrame({'date': ['05-24-2015','06-24-2015'],
'KPI_01': ['green','orange'],
'KPI_02': ['green','red'],
'KPI_03': ['red',np.nan]
})
df.set_index('date', inplace=True)
# Transforming to long format
df.reset_index(inplace=True)
long = pd.melt(df, id_vars=['date'])
# Pivoting data
pivoted = pd.pivot_table(long, index='date', columns=['value'], aggfunc='count', fill_value=0)
# Dropping unnecessary level
pivoted.columns = pivoted.columns.droplevel()
Upvotes: 1
Views: 58
Reputation: 353159
You could apply
value_counts
:
>>> df.apply(pd.Series.value_counts,axis=1).fillna(0)
green orange red
date
05-24-2015 2 0 1
06-24-2015 0 1 1
apply
tends to be slow, and row-wise operations slow as well, but to be honest if your frame isn't very big you might not even notice the difference.
Upvotes: 1