Funkeh-Monkeh
Funkeh-Monkeh

Reputation: 661

Get sum of events in one day

I have a dataset like the one shown below (15 years of data) of half hourly interval data and the occurrence of an event (1 meaning it happened and 0 that it did not)

Date                    Event
2002-04-27 19:30:00      0
2002-04-27 20:00:00      0
2002-04-27 20:30:00      0
2002-04-27 21:00:00      0
2002-04-27 21:30:00      1
2002-04-27 22:00:00      1
2002-04-27 22:30:00      0
2002-04-27 23:00:00      0
2002-04-27 23:30:00      1
2002-04-28 00:00:00      1
2002-04-28 00:30:00      1
2002-04-28 01:00:00      1
2002-04-28 01:30:00      0
2002-04-28 02:00:00      0
2002-04-28 02:30:00      0
2002-04-28 03:00:00      0
2002-04-28 03:30:00      0
2002-04-28 04:00:00      0
2002-04-28 04:30:00      0
2002-04-28 05:00:00      0
2002-04-28 05:30:00      0
2002-04-28 06:00:00      0
2002-04-28 06:30:00      0
2002-04-28 07:00:00      0

What I would like to do is to calculate for each day (e.g 2002-04-27) the number of events that occurred. However, consecutive 1's mean that it is just one event and also 1's crossing a day, say for example 2002-04-27 21:30:00 has a 1, and so is 2002-04-28 00:00:00 but that would be considered as 1 event only that occurred on the 2002-04-27. An output like something below would be ideal.

Date           No_Event
2002-04-27       2
2002-04-28       0

So, how would I go about doing this? Any help is very much appreciated.

Upvotes: 4

Views: 773

Answers (3)

ExperimenteR
ExperimenteR

Reputation: 4473

Using lubridate(for group by day) and data.table

library(data.table)
library(lubridate)
setDT(df)
df[Event!=shift(Event, fill=0), sum(Event), by=floor_date(Date, unit="day")]

#   floor_date V1
#1: 2002-04-27  2
#2: 2002-04-28  0

df used in above example

 df <- data.frame(Date=seq(as.POSIXct("2002-04-27 19:30:00 ", tz="GMT"), as.POSIXct("2002-04-28 07:00:00 ", tz="GMT"), by="30 min"),
                     Event=c(0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L))

Upvotes: 3

G. Grothendieck
G. Grothendieck

Reputation: 270195

Here are a few different ways to do it:

1) base Here is a base solution. First we create a logical vector which is TRUE if the current event is 1 and the prior event is 0 (which occurs when diff == 1) and then we sum it by Date:

No_Events <- tapply(diff(c(0, df$Event)) == 1, as.Date(df$Date), sum)

giving:

> No_Events
2002-04-27 2002-04-28 
         2          0 

If its important that the output be a data frame then try as.data.frame(No_Events) or data.frame(Date = as.Date(names(No_Events)), No_Event = unname(No_Events)) .

1a) Here is a variation using aggregate instead of tapply where we first create a data frame tr that contains a Date column with just dates and a No_Event column marking any 1 not preceeded by a 1. Then we perform the aggregate:

tr <- transform(df, No_Event = diff(c(0, df$Event)) == 1, Date = as.Date(Date))
aggregate(No_Event ~ Date, tr, sum)

giving:

        Date No_Event
1 2002-04-27        2
2 2002-04-28        0

2) zoo Here is a zoo solution. Read the data frame into a zoo object, merge it with a diff==1 column filling appropriately and aggregate:

library(zoo)
z <- read.zoo(df, tz = "")
m <- merge(z, No_Event = diff(z) == 1, fill = coredata(z[1]))
z.ag <- aggregate(m, as.Date(format(time(z))), sum)

giving:

> z.ag
           z No_Event
2002-04-27 3        2
2002-04-28 3        0

Ignore the z column or remove it via z.ag[, -1] or via z.ag[, -1, drop = FALSE] .

Upvotes: 3

Dominic Comtois
Dominic Comtois

Reputation: 10421

First make a "day" column

dat$day <- strftime(x = dat$Date, format = "%D") # try %F as well

Find consecutive 1's and keep only the first

for(i in nrow(dat):2) {
  if(dat$Event[i]==1 && dat$Event[i-1]==1)
      dat$Event[i] <- 0
}

Then agregate the results

by(data = dat$Event, INDICES = dat$day, FUN = sum)

dat$day: 04/27/02
[1] 2
-----------------------------------------------------------------
dat$day: 04/28/02
[1] 0

Upvotes: 1

Related Questions