VJ D
VJ D

Reputation: 177

applying cut() on R dataframe daywise

I have a datatable in R on which I apply a cut() and table(). I am able to get the frequency table based on the conditions. But I am getting overall frequencies. I want to get it day wise. I have a column named as timestamp which have timestamp. Also I have a section column which has value either A or B. How to cut it based on each day each section.

My current output :

Var1    Freq
0-30    1398
30-60   1051
60-80   1006
80-100  36
100>    2

Expected output:

Date        Sec  Var1    Freq
05-01-2020   A   0-30    1398
05-01-2020   A   30-60   1051
05-01-2020   A   60-80   1006
05-01-2020   A   80-100  36
05-01-2020   A   100>    2
05-01-2020   B   0-30    138
05-01-2020   B   30-60   151
05-01-2020   B   60-80   1056
05-01-2020   B   80-100  3
05-01-2020   B   100>    6
06-01-2020   A   0-30    398
06-01-2020   A   30-60   11
06-01-2020   A   60-80   16
06-01-2020   A   80-100  3666
06-01-2020   A   100>    25
06-01-2020   B   0-30    1948
06-01-2020   B   30-60   233
06-01-2020   B   60-80   33
06-01-2020   B   80-100  433
06-01-2020   B   100>    34
and so on.....

My code :

result = as.data.frame(table(cut(df$temp,breaks=c(0,30,60,80,100,1000),labels=c("0-30","30-60","60-80","80-100","100>"))))

Sample data :

timestamp        section    temp
01-05-2020 00:01:22 A   45
02-05-2020 00:01:22 A   10
03-05-2020 00:01:22 A   98
04-05-2020 00:01:22 A   23
05-05-2020 00:01:22 A   67
06-05-2020 00:01:22 A   89
07-05-2020 00:01:22 A   100
08-05-2020 00:01:22 A   110
09-05-2020 00:01:22 A   123
10-05-2020 00:01:22 A   12
01-05-2020 00:01:22 B   23
02-05-2020 00:01:22 B   44
03-05-2020 00:01:22 B   55
04-05-2020 00:01:22 B   12
05-05-2020 00:01:22 B   98
06-05-2020 00:01:22 B   45
07-05-2020 00:01:22 B   87
08-05-2020 00:01:22 B   65
09-05-2020 00:01:22 B   12
10-05-2020 00:01:22 B   8

Upvotes: 2

Views: 62

Answers (2)

Uwe
Uwe

Reputation: 42544

This question has been tagged with data.table. So, for the sake of completeness, here is also a data.table solution:

library(data.table)
setDT(df)[, .(Freq = .N), 
          by = .(Date = as.IDate(timestamp, "%d-%m-%Y"), section,
                 Var1 = cut(temp, c(0, 30, 60, 80, 100, Inf), c("0-30", "30-60", "60-80", "80-100", "100>")))]
          Date section   Var1 Freq
 1: 2020-05-01       A  30-60    1
 2: 2020-05-02       A   0-30    1
 3: 2020-05-03       A 80-100    1
 4: 2020-05-04       A   0-30    1
 5: 2020-05-05       A  60-80    1
 6: 2020-05-06       A 80-100    1
 7: 2020-05-07       A 80-100    1
 8: 2020-05-08       A   100>    1
 9: 2020-05-09       A   100>    1
10: 2020-05-10       A   0-30    1
11: 2020-05-01       B   0-30    1
12: 2020-05-02       B  30-60    1
13: 2020-05-03       B  30-60    1
14: 2020-05-04       B   0-30    1
15: 2020-05-05       B 80-100    1
16: 2020-05-06       B  30-60    1
17: 2020-05-07       B 80-100    1
18: 2020-05-08       B  60-80    1
19: 2020-05-09       B   0-30    1
20: 2020-05-10       B   0-30    1

or, in a more programmatic way

breaks <- c(0, 30, 60, 80, 100, Inf)
labels <- stringr::str_replace(paste(head(breaks, -1L), tail(breaks, -1L), sep = "-"), "-Inf", ">")
setDT(df)[, .(Freq = .N), by = .(Date = as.IDate(timestamp, "%d-%m-%Y"), section,
                                 Var1 = cut(temp, breaks, labels))]

Data

library(data.table)
df <- fread(
  "timestamp time       section    temp
01-05-2020 00:01:22 A   45
02-05-2020 00:01:22 A   10
03-05-2020 00:01:22 A   98
04-05-2020 00:01:22 A   23
05-05-2020 00:01:22 A   67
06-05-2020 00:01:22 A   89
07-05-2020 00:01:22 A   100
08-05-2020 00:01:22 A   110
09-05-2020 00:01:22 A   123
10-05-2020 00:01:22 A   12
01-05-2020 00:01:22 B   23
02-05-2020 00:01:22 B   44
03-05-2020 00:01:22 B   55
04-05-2020 00:01:22 B   12
05-05-2020 00:01:22 B   98
06-05-2020 00:01:22 B   45
07-05-2020 00:01:22 B   87
08-05-2020 00:01:22 B   65
09-05-2020 00:01:22 B   12
10-05-2020 00:01:22 B   8")[
  , c("timestamp", "time") := .(paste(timestamp, time), NULL)]

Upvotes: 0

akrun
akrun

Reputation: 886938

Here is one option. We do the grouping by 'Date' and the cut on the 'temp' column, then summarise to get the frequency with n()

library(dplyr)
library(lubridate)
df %>%
    group_by(Date = as.Date(dmy_hms(timestamp)), section,
     Var1 = cut(temp, breaks=c(0,30,60,80,100,1000),
           labels=c("0-30","30-60","60-80", "80-100", "100>"))) %>%
    summarise(Freq = n())
# A tibble: 20 x 4
# Groups:   Date, section [20]
#   Date       section Var1    Freq
#   <date>     <chr>   <fct>  <int>
# 1 2020-05-01 A       30-60      1
# 2 2020-05-01 B       0-30       1
# 3 2020-05-02 A       0-30       1
# 4 2020-05-02 B       30-60      1
# 5 2020-05-03 A       80-100     1
# 6 2020-05-03 B       30-60      1
# 7 2020-05-04 A       0-30       1
# 8 2020-05-04 B       0-30       1
# 9 2020-05-05 A       60-80      1
#10 2020-05-05 B       80-100     1
#11 2020-05-06 A       80-100     1
#12 2020-05-06 B       30-60      1
#13 2020-05-07 A       80-100     1
#14 2020-05-07 B       80-100     1
#15 2020-05-08 A       100>       1
#16 2020-05-08 B       60-80      1
#17 2020-05-09 A       100>       1
#18 2020-05-09 B       0-30       1
#19 2020-05-10 A       0-30       1
#20 2020-05-10 B       0-30       1

data

df <- structure(list(timestamp = c("01-05-2020 00:01:22", "02-05-2020 00:01:22", 
"03-05-2020 00:01:22", "04-05-2020 00:01:22", "05-05-2020 00:01:22", 
"06-05-2020 00:01:22", "07-05-2020 00:01:22", "08-05-2020 00:01:22", 
"09-05-2020 00:01:22", "10-05-2020 00:01:22", "01-05-2020 00:01:22", 
"02-05-2020 00:01:22", "03-05-2020 00:01:22", "04-05-2020 00:01:22", 
"05-05-2020 00:01:22", "06-05-2020 00:01:22", "07-05-2020 00:01:22", 
"08-05-2020 00:01:22", "09-05-2020 00:01:22", "10-05-2020 00:01:22"
), section = c("A", "A", "A", "A", "A", "A", "A", "A", "A", "A", 
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B"), temp = c(45L, 
10L, 98L, 23L, 67L, 89L, 100L, 110L, 123L, 12L, 23L, 44L, 55L, 
12L, 98L, 45L, 87L, 65L, 12L, 8L)), class = "data.frame", row.names = c(NA, 
-20L))

Upvotes: 2

Related Questions