Reputation: 661
I have a dataset like the one shown below (15 years of data) of half hourly interval data and the occurrence of an event (1 meaning it happened and 0 that it did not)
Date Event
2002-04-27 19:30:00 0
2002-04-27 20:00:00 0
2002-04-27 20:30:00 0
2002-04-27 21:00:00 0
2002-04-27 21:30:00 1
2002-04-27 22:00:00 1
2002-04-27 22:30:00 0
2002-04-27 23:00:00 0
2002-04-27 23:30:00 1
2002-04-28 00:00:00 1
2002-04-28 00:30:00 1
2002-04-28 01:00:00 1
2002-04-28 01:30:00 0
2002-04-28 02:00:00 0
2002-04-28 02:30:00 0
2002-04-28 03:00:00 0
2002-04-28 03:30:00 0
2002-04-28 04:00:00 0
2002-04-28 04:30:00 0
2002-04-28 05:00:00 0
2002-04-28 05:30:00 0
2002-04-28 06:00:00 0
2002-04-28 06:30:00 0
2002-04-28 07:00:00 0
What I would like to do is to calculate for each day (e.g 2002-04-27
) the number of events that occurred. However, consecutive 1's mean that it is just one event and also 1's crossing a day, say for example 2002-04-27 21:30:00
has a 1, and so is 2002-04-28 00:00:00
but that would be considered as 1 event only that occurred on the 2002-04-27
. An output like something below would be ideal.
Date No_Event
2002-04-27 2
2002-04-28 0
So, how would I go about doing this? Any help is very much appreciated.
Upvotes: 4
Views: 773
Reputation: 4473
Using lubridate
(for group by day) and data.table
library(data.table)
library(lubridate)
setDT(df)
df[Event!=shift(Event, fill=0), sum(Event), by=floor_date(Date, unit="day")]
# floor_date V1
#1: 2002-04-27 2
#2: 2002-04-28 0
df
used in above example
df <- data.frame(Date=seq(as.POSIXct("2002-04-27 19:30:00 ", tz="GMT"), as.POSIXct("2002-04-28 07:00:00 ", tz="GMT"), by="30 min"),
Event=c(0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L))
Upvotes: 3
Reputation: 270195
Here are a few different ways to do it:
1) base Here is a base solution. First we create a logical vector which is TRUE if the current event is 1 and the prior event is 0 (which occurs when diff == 1) and then we sum it by Date:
No_Events <- tapply(diff(c(0, df$Event)) == 1, as.Date(df$Date), sum)
giving:
> No_Events
2002-04-27 2002-04-28
2 0
If its important that the output be a data frame then try as.data.frame(No_Events)
or data.frame(Date = as.Date(names(No_Events)), No_Event = unname(No_Events))
.
1a) Here is a variation using aggregate
instead of tapply
where we first create a data frame tr
that contains a Date
column with just dates and a No_Event
column marking any 1 not preceeded by a 1. Then we perform the aggregate
:
tr <- transform(df, No_Event = diff(c(0, df$Event)) == 1, Date = as.Date(Date))
aggregate(No_Event ~ Date, tr, sum)
giving:
Date No_Event
1 2002-04-27 2
2 2002-04-28 0
2) zoo Here is a zoo solution. Read the data frame into a zoo object, merge it with a diff==1 column filling appropriately and aggregate
:
library(zoo)
z <- read.zoo(df, tz = "")
m <- merge(z, No_Event = diff(z) == 1, fill = coredata(z[1]))
z.ag <- aggregate(m, as.Date(format(time(z))), sum)
giving:
> z.ag
z No_Event
2002-04-27 3 2
2002-04-28 3 0
Ignore the z
column or remove it via z.ag[, -1]
or via z.ag[, -1, drop = FALSE]
.
Upvotes: 3
Reputation: 10421
First make a "day" column
dat$day <- strftime(x = dat$Date, format = "%D") # try %F as well
Find consecutive 1's and keep only the first
for(i in nrow(dat):2) {
if(dat$Event[i]==1 && dat$Event[i-1]==1)
dat$Event[i] <- 0
}
Then agregate the results
by(data = dat$Event, INDICES = dat$day, FUN = sum)
dat$day: 04/27/02
[1] 2
-----------------------------------------------------------------
dat$day: 04/28/02
[1] 0
Upvotes: 1