Reputation: 11
I want to use one column which indicates the start point for each sample and then flag the points (rows) that follow the start point up until a maximum amount of time is reached.
For example - my data (d) looks like:
> head(d)
Sample Seconds Value FLAG
1 A 356 1 1
2 A 357 1 NA
3 A 358 9 NA
4 A 359 4 NA
5 A 400 1 NA
6 A 401 3 NA
A reproducible copy is here:
d <- structure(list(Sample = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L
), .Label = c("A", "B", "C"), class = "factor"), Seconds = c(356L,
357L, 358L, 359L, 400L, 401L, 402L, 403L, 2955L, 2957L, 2959L,
3001L, 3002L, 3004L, 2548L, 2549L, 2552L, 2553L, 2554L, 2555L,
2556L, 2557L, 2558L), Value = c(1L, 1L, 9L, 4L, 1L, 3L, 7L, 2L,
25L, 17L, 23L, 47L, 34L, 15L, 30L, 16L, 17L, 12L, 6L, 8L, 6L,
6L, 5L), FLAG = c(1L, NA, NA, NA, NA, NA, NA, NA, 1L, NA, NA,
NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("Sample",
"Seconds", "Value", "FLAG"), class = "data.frame", row.names = c(NA,
-23L))
I only want the first five seconds of data for each sample. The flag indicates the first row of the sample (keep in mind, this is a simplified version - my real data requires that I set up a flag handle to find start points). I want to grab the row with the start point (Flag=1) and then go to d$Seconds
and put a 1 value in each row within a 5 second window from the start point. I can't just add five because some samples have a point at 4 and then 6 seconds from the start. I am working with a large dataset so I am also trying to avoid a for loop. Any ideas? (sorry for the data format - I haven't posted before and it wouldn't let me put an image)
Upvotes: 1
Views: 151
Reputation: 93813
Here's one method in base R using by
:
d$within5 <- unlist(
by(
d,
d$Sample,
function (x) x$Seconds <= (x$Seconds[!is.na(x$FLAG)]+5)
)
)
Result:
> head(d)
Sample Seconds Value FLAG within5
1 A 356 1 1 TRUE
2 A 357 1 NA TRUE
3 A 358 9 NA TRUE
4 A 359 4 NA TRUE
5 A 400 1 NA FALSE
6 A 401 3 NA FALSE
Upvotes: 1