Reputation: 1393
I have data collected at time intervals from a database. The metrics are counters, as in ever-increasing. To get the value of metric for a given time, you have to subtract one row from the previous version of the same row.
Example:
TS INST_ID EVENT WAIT_TIME_MILLI WAIT_COUNT
2014-01-29 17:20:36 1 log file sync 1 756873
2014-01-29 17:20:36 1 log file sync 2 15627
2014-01-29 17:20:36 1 log file sync 4 2925
2014-01-29 17:21:03 1 log file sync 1 761063
2014-01-29 17:21:03 1 log file sync 2 15659
2014-01-29 17:21:03 1 log file sync 4 2929
Desired Output:
TS INST_ID EVENT WAIT_TIME_MILLI WAIT_COUNT
2014-01-29 17:21:03 1 log file sync 1 4190
2014-01-29 17:21:03 1 log file sync 2 32
2014-01-29 17:21:03 1 log file sync 4 4
TS is the time when the metric was collected. INST_ID, EVENT, and WAIT_TIME_MILLI are static identifiers. I want to calculate the delta of the WAIT_COUNT from one TS to the next.
I've simplified the data a bit, but if it's important there are many events and can be multiple INST_IDs.
Here's the test data frame:
structure(list(TS = structure(c(1391034063.541, 1391034063.541,
1391034063.541, 1391034036.136, 1391034036.136, 1391034036.136
), class = c("POSIXct", "POSIXt")), INST_ID = c(1, 1, 1, 1, 1,
1), EVENT = c("log file sync", "log file sync", "log file sync",
"log file sync", "log file sync", "log file sync"), WAIT_TIME_MILLI = c(1,
2, 4, 1, 2, 4), WAIT_COUNT = c(761063, 15659, 2929, 756873, 15627,
2925)), .Names = c("TS", "INST_ID", "EVENT", "WAIT_TIME_MILLI",
"WAIT_COUNT"), class = "data.frame", row.names = c(NA, 6L))
Upvotes: 1
Views: 3444
Reputation: 52697
@mlt's suggestion implemented in data.table
:
library(data.table)
dt <- data.table(df, key="TS") # `key` orders dt by TS ascending
dt[,
list(
TS=tail(TS, -1L), # all but first
WAIT_COUNT=diff(WAIT_COUNT)), # differences in WAIT_COUNT
by=list(INST_ID, EVENT, WAIT_TIME_MILLI) # split by these fields
]
# INST_ID EVENT WAIT_TIME_MILLI TS WAIT_COUNT
# 1: 1 log file sync 1 2014-01-29 17:21:03 4190
# 2: 1 log file sync 2 2014-01-29 17:21:03 32
# 3: 1 log file sync 4 2014-01-29 17:21:03 4
Basically, you break up your data by INST_ID/EVENT/WAIT_TIME, and then for each group diff
all the values and drop the first time stamp.
EDIT: somehow failed to notice the dput
at the end.
Upvotes: 3
Reputation: 5249
If your data is a data.frame called dat
library(dplyr)
dat <- arrange(dat, WAIT_TIME_MILLI, TS)
dat <- group_by(dat, WAIT_TIME_MILLI)
dat <- mutate(dat, diff = WAIT_COUNT - lag(WAIT_COUNT))
filter(dat, !is.na(diff))
or:
library(dplyr)
dat %.%
arrange(WAIT_TIME_MILLI, TS) %.%
group_by(WAIT_TIME_MILLI) %.%
mutate(diff = WAIT_COUNT - lag(WAIT_COUNT)) %.%
filter(!is.na(diff))
Upvotes: 5