Reputation: 43
I have a data frame, sorted by groupID
and date
:
d1 <- data.frame(groupID = c(1,1,1,1,1,3,3,3,3),
date = c(1,2,3,4,5,6,7,8,9),
value = c(1,1,25,1,1,25,1,25,1))
> d1
groupID date value
1 1 1
1 2 1
1 3 25
1 4 1
1 5 1
3 6 25
3 7 1
3 8 25
3 9 1
I want to create two new columns:
Desired output:
groupID date value Prev1s After1s
1 1 1
1 2 1
1 3 25 2 2
1 4 1
1 5 1
3 6 25 0 1
3 7 1
3 8 25 1 1
3 9 1
I'm able to do the same using Excel by creating a counter and taking the previous value. I have tried achieving the same in R using sum
, shift()
but in vain.
Upvotes: 2
Views: 142
Reputation: 83275
An alternative using the data.table
-package in combination with the rle
-function:
library(data.table)
setDT(d1)[, c('prev1s','after1s') := {p <- a <- rle(value);
i <- p$values == 25;
p$values[i] <- shift(p$lengths, fill = 0)[i];
a$values[i] <- shift(a$lengths, type = 'lead', fill = 0)[i];
p$values[!i] <- a$values[!i] <- NA;
list(inverse.rle(p),inverse.rle(a))},
by = groupID][]
which gives:
groupID date value prev1s after1s 1: 1 1 1 NA NA 2: 1 2 1 NA NA 3: 1 3 25 2 2 4: 1 4 1 NA NA 5: 1 5 1 NA NA 6: 3 6 25 0 1 7: 3 7 1 NA NA 8: 3 8 25 1 1 9: 3 9 1 NA NA
Upvotes: 0
Reputation: 18435
You can do this with dplyr
...
library(dplyr)
#first set up some grouping variables based on runs before and after 25s
d1 <- d1 %>% mutate(PrevGp=cumsum(lag(value==25,default = 1)),
AfterGp=cumsum(value==25)) %>%
#use these to calculate the values you want for each group
group_by(groupID,PrevGp) %>% mutate(Prev1s=sum(value)-25) %>%
group_by(groupID,AfterGp) %>% mutate(After1s=sum(value)-25) %>%
ungroup() %>%
#remove values (set to "") other than for value==25
mutate(Prev1s=replace(Prev1s,value!=25,""),
After1s=replace(After1s,value!=25,"")) %>%
#and remove the grouping variables
select(-c(PrevGp,AfterGp))
d1
# A tibble: 9 x 5
groupID date value Prev1s After1s
<dbl> <dbl> <dbl> <chr> <chr>
1 1 1 1
2 1 2 1
3 1 3 25 2 2
4 1 4 1
5 1 5 1
6 3 6 25 0 1
7 3 7 1
8 3 8 25 1 1
9 3 9 1
Upvotes: 1