Colin
Colin

Reputation: 303

R: Moving sum of one column in a data frame based on values in other column

I want to calculate a moving sum of one column (populated with ones and zeroes), but only when the value in a corresponding column (time) is within a (moving) range of values.

My data looks like this:

values <- c(1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0)
seconds <- c(0.0, 1.0, 2.5, 3.0, 5.5, 6.0, 6.5, 7.0, 8.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.5, 16.0, 17.0, 18.0, 19.0, 20.0)
data <- data.frame(values, seconds)

Say I want to sum every 5 seconds worth of data in the 'values' column. Then my first 5-second sum (seconds >=0 & seconds <= 5) would be:

1 (because it corresponds to a 'seconds', 0.0, within the interval of interest)

+

0 (corresponds to 1.0 in 'seconds')

+

0 (2.5)

+

0 (3.0)

= 1

STOPs here because the next value (1) corresponds to 5.5 seconds, outside of the interval.

The next 5-second interval (seconds >= 1 & seconds <= 6) would equal:

0 + 0 + 0 + 1 + 1 = 2

3rd interval:

(seconds >= 2.5 & seconds <= 7.5) = 0 + 0 + 1 + 1 + 0 + 1 = 3

and so on.

I'm an R noob, so this is the method I'm using to calculate it (and it is super slow, so I know there must be a better way):

for(i in 1:20){movsum[i] <- sum(subset(data, seconds >= (seconds[i] - 5.0) & seconds <= seconds[i])$values)}

Thanks for your help. Let me know if there's anything I should clarify.

Upvotes: 1

Views: 689

Answers (2)

Henrik
Henrik

Reputation: 67818

You may try some functions from the zoo package:

library(zoo)

# convert your data to a zoo time series
z <- read.zoo(data, index = "seconds")

# create an empty, regular time series,
# which contains the full time range, in steps of 0.5 sec
z0 <-  zoo(, seq(from = start(z), to = end(z), by = 0.5))

# 'expand' the irregular, original data to a regular series, by merging it with z0 
z2 <- merge(z, z0)

# apply the desired function (sum) to a rolling window of width 11
# (number of observations in each window)
# move the time frame in steps of 2 (by = 2) which correspond to 1 sec 
# use partial = TRUE, to allow the window to pass outside the time range
rollapply(z2, width = 11, by = 2, FUN = sum, na.rm = TRUE,
          align = "left", partial = TRUE)

# 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 
# 1  2  3  3  3  3  2  2  1  2  2  3  3  2  2  1  0  0  0  0  0

Upvotes: 3

David Arenburg
David Arenburg

Reputation: 92300

Here's a possible data.table::foverlaps solution. The idea here is to create 5 seconds interval look up table and then lookup within data which values fall in each interval.

Choose an interval

int <- 5 ## 5 seconds

The load the package, add additional (identical) column to data in order to set boundaries, create a new data set which will have the desired boundaries per row, run foverlaps, key data in order to enable the binary join, find the corresponding values in data$values and sum them per each interval, something like the following seem to work

library(data.table)
setkey(setDT(data)[, seconds2 := seconds], seconds, seconds2)
lookup <- data[, .(seconds, seconds2 = seconds + int)]
res <- foverlaps(lookup, data, which = TRUE)[, values := data$values[yid]]
res[, .(SumValues = sum(values)), by = .(SecInterval = xid)]
#     SecInterval SumValues
#  1:           1         1
#  2:           2         2
#  3:           3         3
#  4:           4         3
#  5:           5         3
#  6:           6         2
#  7:           7         1
#  8:           8         2
#  9:           9         1
# 10:          10         2
# 11:          11         3
# 12:          12         3
# 13:          13         2
# 14:          14         2
# 15:          15         1
# 16:          16         0
# 17:          17         0
# 18:          18         0
# 19:          19         0
# 20:          20         0

Upvotes: 3

Related Questions