Reputation: 1061
I have a dataframe that looks like this:
Date Daily Risk Score Name
25032 2020-07-07 13.0 Buffalo County, NE, US (Corn)
25033 2020-07-08 8.0 Buffalo County, NE, US (Corn)
25034 2020-07-09 8.0 Buffalo County, NE, US (Corn)
25035 2020-07-10 8.0 Buffalo County, NE, US (Corn)
25036 2020-07-11 8.0 Buffalo County, NE, US (Corn)
25037 2020-07-12 8.0 Buffalo County, NE, US (Corn)
25038 2020-07-13 10.0 Buffalo County, NE, US (Corn)
25039 2020-07-14 7.0 Buffalo County, NE, US (Corn)
25040 2020-07-15 7.0 Buffalo County, NE, US (Corn)
25041 2020-07-16 10.0 Buffalo County, NE, US (Corn)
25042 2020-07-17 13.0 Buffalo County, NE, US (Corn)
25043 2020-07-18 13.0 Buffalo County, NE, US (Corn)
25044 2020-07-19 13.0 Buffalo County, NE, US (Corn)
25045 2020-07-20 13.0 Buffalo County, NE, US (Corn)
25046 2020-07-07 8.0 Morgan County, IL, US (Corn)
25047 2020-07-08 8.0 Morgan County, IL, US (Corn)
25048 2020-07-09 8.0 Morgan County, IL, US (Corn)
25049 2020-07-10 8.0 Morgan County, IL, US (Corn)
25050 2020-07-11 8.0 Morgan County, IL, US (Corn)
25051 2020-07-12 8.0 Morgan County, IL, US (Corn)
25052 2020-07-13 8.0 Morgan County, IL, US (Corn)
25053 2020-07-14 8.0 Morgan County, IL, US (Corn)
25054 2020-07-15 8.0 Morgan County, IL, US (Corn)
25055 2020-07-16 9.0 Morgan County, IL, US (Corn)
25056 2020-07-17 10.0 Morgan County, IL, US (Corn)
25057 2020-07-18 10.0 Morgan County, IL, US (Corn)
25058 2020-07-19 8.0 Morgan County, IL, US (Corn)
25059 2020-07-20 8.0 Morgan County, IL, US (Corn)
25060 2020-07-07 9.0 Gray County, KS, US (Corn)
25061 2020-07-08 24.0 Gray County, KS, US (Corn)
25062 2020-07-09 25.0 Gray County, KS, US (Corn)
25063 2020-07-10 13.0 Gray County, KS, US (Corn)
25064 2020-07-11 24.0 Gray County, KS, US (Corn)
25065 2020-07-12 23.0 Gray County, KS, US (Corn)
25066 2020-07-13 25.0 Gray County, KS, US (Corn)
25067 2020-07-14 25.0 Gray County, KS, US (Corn)
25068 2020-07-15 25.0 Gray County, KS, US (Corn)
25069 2020-07-16 25.0 Gray County, KS, US (Corn)
25070 2020-07-17 25.0 Gray County, KS, US (Corn)
25071 2020-07-18 24.0 Gray County, KS, US (Corn)
25072 2020-07-19 21.0 Gray County, KS, US (Corn)
25073 2020-07-20 20.0 Gray County, KS, US (Corn)
I would like to make a new dataframe that lists the date and the number of times each day has a value from 0<x<9, 9<x<17 and >=17. The new dataframe would like this:
Date Low count Mid count High count
2020-07-07 x y z
2020-07-08 a b c
2020-07-09 d e f
...
Should I use groupby to attain this?
Upvotes: 1
Views: 41
Reputation: 153460
You can do this with pd.cut
and pd.crosstab
:
df['count'] = pd.cut(df['Daily Risk Score'],
bins=[0,9,17,np.inf],
labels=['Low Count', 'Mid Count', 'High Count'])
pd.crosstab(df['Date'], df['count'])
Output:
count Low Count Mid Count High Count
Date
2020-07-07 2 1 0
2020-07-08 2 0 1
2020-07-09 2 0 1
2020-07-10 2 1 0
2020-07-11 2 0 1
2020-07-12 2 0 1
2020-07-13 1 1 1
2020-07-14 2 0 1
2020-07-15 2 0 1
2020-07-16 1 1 1
2020-07-17 0 2 1
2020-07-18 0 2 1
2020-07-19 1 1 1
2020-07-20 1 1 1
Upvotes: 3
Reputation: 29635
you can use groupby.count
by the column Date and with a series made with pd.cut
on the score column to label each value with low, mid or high. Then unstack
to get column for each category.
df_ = (df.groupby([df['Date'],
pd.cut(df['score'], bins=[0, 9, 17, np.inf],
labels=['low','mid','high'])])
['score'].count()
.unstack()
)
print (df_)
score low mid high
Date
2020-07-07 2 1 0
2020-07-08 2 0 1
2020-07-09 2 0 1
2020-07-10 2 1 0
2020-07-11 2 0 1
2020-07-12 2 0 1
2020-07-13 1 1 1
2020-07-14 2 0 1
2020-07-15 2 0 1
2020-07-16 1 1 1
2020-07-17 0 2 1
2020-07-18 0 2 1
2020-07-19 1 1 1
2020-07-20 1 1 1
Upvotes: 4