Eli Turasky
Eli Turasky

Reputation: 1061

Count number of occurrences of a value through time

I have a dataframe that looks like this:

        Date        Daily Risk Score    Name
25032   2020-07-07  13.0                Buffalo County, NE, US (Corn)
25033   2020-07-08  8.0                 Buffalo County, NE, US (Corn)
25034   2020-07-09  8.0                 Buffalo County, NE, US (Corn)
25035   2020-07-10  8.0                 Buffalo County, NE, US (Corn)
25036   2020-07-11  8.0                 Buffalo County, NE, US (Corn)
25037   2020-07-12  8.0                 Buffalo County, NE, US (Corn)
25038   2020-07-13  10.0                Buffalo County, NE, US (Corn)
25039   2020-07-14  7.0                 Buffalo County, NE, US (Corn)
25040   2020-07-15  7.0                 Buffalo County, NE, US (Corn)
25041   2020-07-16  10.0                Buffalo County, NE, US (Corn)
25042   2020-07-17  13.0                Buffalo County, NE, US (Corn)
25043   2020-07-18  13.0                Buffalo County, NE, US (Corn)
25044   2020-07-19  13.0                Buffalo County, NE, US (Corn)
25045   2020-07-20  13.0                Buffalo County, NE, US (Corn)
25046   2020-07-07  8.0                 Morgan County, IL, US (Corn)
25047   2020-07-08  8.0                 Morgan County, IL, US (Corn)
25048   2020-07-09  8.0                 Morgan County, IL, US (Corn)
25049   2020-07-10  8.0                 Morgan County, IL, US (Corn)
25050   2020-07-11  8.0                 Morgan County, IL, US (Corn)
25051   2020-07-12  8.0                 Morgan County, IL, US (Corn)
25052   2020-07-13  8.0                 Morgan County, IL, US (Corn)
25053   2020-07-14  8.0                 Morgan County, IL, US (Corn)
25054   2020-07-15  8.0                 Morgan County, IL, US (Corn)
25055   2020-07-16  9.0                 Morgan County, IL, US (Corn)
25056   2020-07-17  10.0                Morgan County, IL, US (Corn)
25057   2020-07-18  10.0                Morgan County, IL, US (Corn)
25058   2020-07-19  8.0                 Morgan County, IL, US (Corn)
25059   2020-07-20  8.0                 Morgan County, IL, US (Corn)
25060   2020-07-07  9.0                 Gray County, KS, US (Corn)
25061   2020-07-08  24.0                Gray County, KS, US (Corn)
25062   2020-07-09  25.0                Gray County, KS, US (Corn)
25063   2020-07-10  13.0                Gray County, KS, US (Corn)
25064   2020-07-11  24.0                Gray County, KS, US (Corn)
25065   2020-07-12  23.0                Gray County, KS, US (Corn)
25066   2020-07-13  25.0                Gray County, KS, US (Corn)
25067   2020-07-14  25.0                Gray County, KS, US (Corn)
25068   2020-07-15  25.0                Gray County, KS, US (Corn)
25069   2020-07-16  25.0                Gray County, KS, US (Corn)
25070   2020-07-17  25.0                Gray County, KS, US (Corn)
25071   2020-07-18  24.0                Gray County, KS, US (Corn)
25072   2020-07-19  21.0                Gray County, KS, US (Corn)
25073   2020-07-20  20.0                Gray County, KS, US (Corn)

I would like to make a new dataframe that lists the date and the number of times each day has a value from 0<x<9, 9<x<17 and >=17. The new dataframe would like this:

Date   Low count   Mid count   High count
2020-07-07   x   y   z
2020-07-08   a   b   c
2020-07-09   d   e   f
...

Should I use groupby to attain this?

Upvotes: 1

Views: 41

Answers (2)

Scott Boston
Scott Boston

Reputation: 153460

You can do this with pd.cut and pd.crosstab:

df['count'] = pd.cut(df['Daily Risk Score'], 
                     bins=[0,9,17,np.inf], 
                     labels=['Low Count', 'Mid Count', 'High Count'])

pd.crosstab(df['Date'], df['count'])

Output:

count       Low Count  Mid Count  High Count
Date                                        
2020-07-07          2          1           0
2020-07-08          2          0           1
2020-07-09          2          0           1
2020-07-10          2          1           0
2020-07-11          2          0           1
2020-07-12          2          0           1
2020-07-13          1          1           1
2020-07-14          2          0           1
2020-07-15          2          0           1
2020-07-16          1          1           1
2020-07-17          0          2           1
2020-07-18          0          2           1
2020-07-19          1          1           1
2020-07-20          1          1           1

Upvotes: 3

Ben.T
Ben.T

Reputation: 29635

you can use groupby.count by the column Date and with a series made with pd.cut on the score column to label each value with low, mid or high. Then unstack to get column for each category.

df_ = (df.groupby([df['Date'], 
                   pd.cut(df['score'], bins=[0, 9, 17, np.inf], 
                          labels=['low','mid','high'])])
         ['score'].count()
         .unstack()
      )
print (df_)
score       low  mid  high
Date                      
2020-07-07    2    1     0
2020-07-08    2    0     1
2020-07-09    2    0     1
2020-07-10    2    1     0
2020-07-11    2    0     1
2020-07-12    2    0     1
2020-07-13    1    1     1
2020-07-14    2    0     1
2020-07-15    2    0     1
2020-07-16    1    1     1
2020-07-17    0    2     1
2020-07-18    0    2     1
2020-07-19    1    1     1
2020-07-20    1    1     1

Upvotes: 4

Related Questions