M. Beausoleil
M. Beausoleil

Reputation: 3555

Count the number of elements between 2 dates conditionally on a variable in R

I'm trying to count the number of precipitation below a certain threshold (let's say less or equal than 50) between two dates.

Basically, I have a vector cuts that contains the dates that I want to count between inclusively. I want to use the cuts vector to "subset" the dataset in different bins and than count the number of events where it was raining less than 50 mm of rain.

I'm using dplyr and a for loop at the moment, but nothing is working.

set.seed(12345)
df = data.frame(date = seq(as.Date("2000/03/01"), as.Date("2002/03/01"), "days"), 
                precipitation = rnorm(length(seq(as.Date("2000/03/01"), as.Date("2002/03/01"), "days")),80,20))
cuts = c("2001-11-25","2002-01-01","2002-02-18","2002-03-01")
for (i in 1:length(cuts)) {
  df %>% summarise(count.prec = if (date > cuts[i] | date < cuts[i+1]) {count(precipitation <= 50)})
}

But I have this error message:

Error: no applicable method for 'group_by_' applied to an object of class "logical"
In addition: Warning message:
In if (c(11017, 11018, 11019, 11020, 11021, 11022, 11023, 11024,  :
  the condition has length > 1 and only the first element will be used

This is not working either:

for (i in 1:length(cuts)) {
  df %>% if (date > cuts[i] | date < cuts[i+1])%>% summarise(count.prec = count(precipitation <= 50))
}

Upvotes: 3

Views: 1442

Answers (2)

Steven Beaupr&#233;
Steven Beaupr&#233;

Reputation: 21621

You could try:

df %>%
  group_by(gr = cut(date, breaks = as.Date(cuts))) %>%
  summarise(res = sum(precipitation <= 50))

Which gives:

# A tibble: 4 × 2
          gr   res
      <fctr> <int>
1 2001-11-25     1
2 2002-01-01     4
3 2002-02-18     2
4         NA    40

Or as per mentioned by @Frank - you could replace summarise() by tally(precipitation <= 50)

Upvotes: 5

akrun
akrun

Reputation: 886938

We can try with non-equi join using data.table

library(data.table)#v1.9.7+
df2 <- data.table(cuts1 = as.Date(cuts[-length(cuts)]), cuts2 = as.Date(cuts[-1]))
setDT(df)[df2, .(Count = sum(precipitation <=50)),
           on = .(date > cuts1,  date < cuts2), by = .EACHI]
#         date       date Count
#1: 2001-11-25 2002-01-01     1
#2: 2002-01-01 2002-02-18     4
#3: 2002-02-18 2002-03-01     2

Upvotes: 1

Related Questions