applying cut() on R dataframe daywise

Question

I have a datatable in R on which I apply a cut() and table(). I am able to get the frequency table based on the conditions. But I am getting overall frequencies. I want to get it day wise. I have a column named as timestamp which have timestamp. Also I have a section column which has value either A or B. How to cut it based on each day each section.

My current output :

Var1    Freq
0-30    1398
30-60   1051
60-80   1006
80-100  36
100>    2

Expected output:

Date        Sec  Var1    Freq
05-01-2020   A   0-30    1398
05-01-2020   A   30-60   1051
05-01-2020   A   60-80   1006
05-01-2020   A   80-100  36
05-01-2020   A   100>    2
05-01-2020   B   0-30    138
05-01-2020   B   30-60   151
05-01-2020   B   60-80   1056
05-01-2020   B   80-100  3
05-01-2020   B   100>    6
06-01-2020   A   0-30    398
06-01-2020   A   30-60   11
06-01-2020   A   60-80   16
06-01-2020   A   80-100  3666
06-01-2020   A   100>    25
06-01-2020   B   0-30    1948
06-01-2020   B   30-60   233
06-01-2020   B   60-80   33
06-01-2020   B   80-100  433
06-01-2020   B   100>    34
and so on.....

My code :

result = as.data.frame(table(cut(df$temp,breaks=c(0,30,60,80,100,1000),labels=c("0-30","30-60","60-80","80-100","100>"))))

Sample data :

timestamp        section    temp
01-05-2020 00:01:22 A   45
02-05-2020 00:01:22 A   10
03-05-2020 00:01:22 A   98
04-05-2020 00:01:22 A   23
05-05-2020 00:01:22 A   67
06-05-2020 00:01:22 A   89
07-05-2020 00:01:22 A   100
08-05-2020 00:01:22 A   110
09-05-2020 00:01:22 A   123
10-05-2020 00:01:22 A   12
01-05-2020 00:01:22 B   23
02-05-2020 00:01:22 B   44
03-05-2020 00:01:22 B   55
04-05-2020 00:01:22 B   12
05-05-2020 00:01:22 B   98
06-05-2020 00:01:22 B   45
07-05-2020 00:01:22 B   87
08-05-2020 00:01:22 B   65
09-05-2020 00:01:22 B   12
10-05-2020 00:01:22 B   8

akrun · Accepted Answer

Here is one option. We do the grouping by 'Date' and the cut on the 'temp' column, then summarise to get the frequency with n()

library(dplyr)
library(lubridate)
df %>%
    group_by(Date = as.Date(dmy_hms(timestamp)), section,
     Var1 = cut(temp, breaks=c(0,30,60,80,100,1000),
           labels=c("0-30","30-60","60-80", "80-100", "100>"))) %>%
    summarise(Freq = n())
# A tibble: 20 x 4
# Groups:   Date, section [20]
#   Date       section Var1    Freq
#             
# 1 2020-05-01 A       30-60      1
# 2 2020-05-01 B       0-30       1
# 3 2020-05-02 A       0-30       1
# 4 2020-05-02 B       30-60      1
# 5 2020-05-03 A       80-100     1
# 6 2020-05-03 B       30-60      1
# 7 2020-05-04 A       0-30       1
# 8 2020-05-04 B       0-30       1
# 9 2020-05-05 A       60-80      1
#10 2020-05-05 B       80-100     1
#11 2020-05-06 A       80-100     1
#12 2020-05-06 B       30-60      1
#13 2020-05-07 A       80-100     1
#14 2020-05-07 B       80-100     1
#15 2020-05-08 A       100>       1
#16 2020-05-08 B       60-80      1
#17 2020-05-09 A       100>       1
#18 2020-05-09 B       0-30       1
#19 2020-05-10 A       0-30       1
#20 2020-05-10 B       0-30       1

data

df <- structure(list(timestamp = c("01-05-2020 00:01:22", "02-05-2020 00:01:22", 
"03-05-2020 00:01:22", "04-05-2020 00:01:22", "05-05-2020 00:01:22", 
"06-05-2020 00:01:22", "07-05-2020 00:01:22", "08-05-2020 00:01:22", 
"09-05-2020 00:01:22", "10-05-2020 00:01:22", "01-05-2020 00:01:22", 
"02-05-2020 00:01:22", "03-05-2020 00:01:22", "04-05-2020 00:01:22", 
"05-05-2020 00:01:22", "06-05-2020 00:01:22", "07-05-2020 00:01:22", 
"08-05-2020 00:01:22", "09-05-2020 00:01:22", "10-05-2020 00:01:22"
), section = c("A", "A", "A", "A", "A", "A", "A", "A", "A", "A", 
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B"), temp = c(45L, 
10L, 98L, 23L, 67L, 89L, 100L, 110L, 123L, 12L, 23L, 44L, 55L, 
12L, 98L, 45L, 87L, 65L, 12L, 8L)), class = "data.frame", row.names = c(NA, 
-20L))

applying cut() on R dataframe daywise

Answers (2)

Data

data

Related Questions