Reputation: 177
I have a datatable in R on which I apply a cut()
and table()
. I am able to get the frequency table based on the conditions. But I am getting overall frequencies. I want to get it day wise. I have a column named as timestamp
which have timestamp. Also I have a section
column which has value either A
or B
. How to cut it based on each day each section.
My current output :
Var1 Freq
0-30 1398
30-60 1051
60-80 1006
80-100 36
100> 2
Expected output:
Date Sec Var1 Freq
05-01-2020 A 0-30 1398
05-01-2020 A 30-60 1051
05-01-2020 A 60-80 1006
05-01-2020 A 80-100 36
05-01-2020 A 100> 2
05-01-2020 B 0-30 138
05-01-2020 B 30-60 151
05-01-2020 B 60-80 1056
05-01-2020 B 80-100 3
05-01-2020 B 100> 6
06-01-2020 A 0-30 398
06-01-2020 A 30-60 11
06-01-2020 A 60-80 16
06-01-2020 A 80-100 3666
06-01-2020 A 100> 25
06-01-2020 B 0-30 1948
06-01-2020 B 30-60 233
06-01-2020 B 60-80 33
06-01-2020 B 80-100 433
06-01-2020 B 100> 34
and so on.....
My code :
result = as.data.frame(table(cut(df$temp,breaks=c(0,30,60,80,100,1000),labels=c("0-30","30-60","60-80","80-100","100>"))))
Sample data :
timestamp section temp
01-05-2020 00:01:22 A 45
02-05-2020 00:01:22 A 10
03-05-2020 00:01:22 A 98
04-05-2020 00:01:22 A 23
05-05-2020 00:01:22 A 67
06-05-2020 00:01:22 A 89
07-05-2020 00:01:22 A 100
08-05-2020 00:01:22 A 110
09-05-2020 00:01:22 A 123
10-05-2020 00:01:22 A 12
01-05-2020 00:01:22 B 23
02-05-2020 00:01:22 B 44
03-05-2020 00:01:22 B 55
04-05-2020 00:01:22 B 12
05-05-2020 00:01:22 B 98
06-05-2020 00:01:22 B 45
07-05-2020 00:01:22 B 87
08-05-2020 00:01:22 B 65
09-05-2020 00:01:22 B 12
10-05-2020 00:01:22 B 8
Upvotes: 2
Views: 62
Reputation: 42544
This question has been tagged with data.table
. So, for the sake of completeness, here is also a data.table
solution:
library(data.table)
setDT(df)[, .(Freq = .N),
by = .(Date = as.IDate(timestamp, "%d-%m-%Y"), section,
Var1 = cut(temp, c(0, 30, 60, 80, 100, Inf), c("0-30", "30-60", "60-80", "80-100", "100>")))]
Date section Var1 Freq 1: 2020-05-01 A 30-60 1 2: 2020-05-02 A 0-30 1 3: 2020-05-03 A 80-100 1 4: 2020-05-04 A 0-30 1 5: 2020-05-05 A 60-80 1 6: 2020-05-06 A 80-100 1 7: 2020-05-07 A 80-100 1 8: 2020-05-08 A 100> 1 9: 2020-05-09 A 100> 1 10: 2020-05-10 A 0-30 1 11: 2020-05-01 B 0-30 1 12: 2020-05-02 B 30-60 1 13: 2020-05-03 B 30-60 1 14: 2020-05-04 B 0-30 1 15: 2020-05-05 B 80-100 1 16: 2020-05-06 B 30-60 1 17: 2020-05-07 B 80-100 1 18: 2020-05-08 B 60-80 1 19: 2020-05-09 B 0-30 1 20: 2020-05-10 B 0-30 1
or, in a more programmatic way
breaks <- c(0, 30, 60, 80, 100, Inf)
labels <- stringr::str_replace(paste(head(breaks, -1L), tail(breaks, -1L), sep = "-"), "-Inf", ">")
setDT(df)[, .(Freq = .N), by = .(Date = as.IDate(timestamp, "%d-%m-%Y"), section,
Var1 = cut(temp, breaks, labels))]
library(data.table)
df <- fread(
"timestamp time section temp
01-05-2020 00:01:22 A 45
02-05-2020 00:01:22 A 10
03-05-2020 00:01:22 A 98
04-05-2020 00:01:22 A 23
05-05-2020 00:01:22 A 67
06-05-2020 00:01:22 A 89
07-05-2020 00:01:22 A 100
08-05-2020 00:01:22 A 110
09-05-2020 00:01:22 A 123
10-05-2020 00:01:22 A 12
01-05-2020 00:01:22 B 23
02-05-2020 00:01:22 B 44
03-05-2020 00:01:22 B 55
04-05-2020 00:01:22 B 12
05-05-2020 00:01:22 B 98
06-05-2020 00:01:22 B 45
07-05-2020 00:01:22 B 87
08-05-2020 00:01:22 B 65
09-05-2020 00:01:22 B 12
10-05-2020 00:01:22 B 8")[
, c("timestamp", "time") := .(paste(timestamp, time), NULL)]
Upvotes: 0
Reputation: 886938
Here is one option. We do the grouping by 'Date' and the cut
on the 'temp' column, then summarise
to get the frequency with n()
library(dplyr)
library(lubridate)
df %>%
group_by(Date = as.Date(dmy_hms(timestamp)), section,
Var1 = cut(temp, breaks=c(0,30,60,80,100,1000),
labels=c("0-30","30-60","60-80", "80-100", "100>"))) %>%
summarise(Freq = n())
# A tibble: 20 x 4
# Groups: Date, section [20]
# Date section Var1 Freq
# <date> <chr> <fct> <int>
# 1 2020-05-01 A 30-60 1
# 2 2020-05-01 B 0-30 1
# 3 2020-05-02 A 0-30 1
# 4 2020-05-02 B 30-60 1
# 5 2020-05-03 A 80-100 1
# 6 2020-05-03 B 30-60 1
# 7 2020-05-04 A 0-30 1
# 8 2020-05-04 B 0-30 1
# 9 2020-05-05 A 60-80 1
#10 2020-05-05 B 80-100 1
#11 2020-05-06 A 80-100 1
#12 2020-05-06 B 30-60 1
#13 2020-05-07 A 80-100 1
#14 2020-05-07 B 80-100 1
#15 2020-05-08 A 100> 1
#16 2020-05-08 B 60-80 1
#17 2020-05-09 A 100> 1
#18 2020-05-09 B 0-30 1
#19 2020-05-10 A 0-30 1
#20 2020-05-10 B 0-30 1
df <- structure(list(timestamp = c("01-05-2020 00:01:22", "02-05-2020 00:01:22",
"03-05-2020 00:01:22", "04-05-2020 00:01:22", "05-05-2020 00:01:22",
"06-05-2020 00:01:22", "07-05-2020 00:01:22", "08-05-2020 00:01:22",
"09-05-2020 00:01:22", "10-05-2020 00:01:22", "01-05-2020 00:01:22",
"02-05-2020 00:01:22", "03-05-2020 00:01:22", "04-05-2020 00:01:22",
"05-05-2020 00:01:22", "06-05-2020 00:01:22", "07-05-2020 00:01:22",
"08-05-2020 00:01:22", "09-05-2020 00:01:22", "10-05-2020 00:01:22"
), section = c("A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B"), temp = c(45L,
10L, 98L, 23L, 67L, 89L, 100L, 110L, 123L, 12L, 23L, 44L, 55L,
12L, 98L, 45L, 87L, 65L, 12L, 8L)), class = "data.frame", row.names = c(NA,
-20L))
Upvotes: 2