Reputation: 3768
Say I have a sorted data frame with a distance variable d
indicating the distance between measures in variable a
.
library(dplyr)
set.seed(1)
df <-
data.frame(a=sort(sample(2:20,8))) %>%
mutate(d = a-lag(a))
This gives:
> df
a d
1 5 NA
2 7 2
3 8 1
4 9 1
5 11 2
6 14 3
7 15 1
8 16 1
I am trying to add a kind off counter/grouping variable g
which indicates whether d
is larger than, say, 2. g
could take values like: g1, g2, ... etc. In other words I would like to "increase" g
when d
> 2. In the data below we would get:
>df a d g
1 5 NA g1
2 7 2 g1
3 8 1 g1
4 9 1 g1
5 11 2 g1
6 14 3 g2
7 15 1 g2
8 16 1 g2
I though of using a function with global side-effect along (and yes, this is generally a bad idea, I could not think of anything else):
f <- function(x){
if(x)
g <<- g +1
return(paste0('g', g))
}
And then do:
g=0
df %>%
mutate(g = ifelse(is.na(d)|d>2, f(T), f(F)))
But g
is not increased in mutate (or sapply). In real -world data I might have 1000s of g
groups.
Upvotes: 0
Views: 60
Reputation: 39174
A solution using dplyr
and data.table
. df2
is the final output.
library(dplyr)
library(data.table)
df2 <- df %>%
mutate(Large2 = ifelse(d > 2, 1, NA)) %>%
mutate(RunID = rleid(Large2)) %>%
mutate(ID = ifelse(RunID %% 2 == 0, RunID + 1, RunID)) %>%
mutate(g = paste0("g", group_indices(., ID))) %>%
select(a, d, g)
Upvotes: 0
Reputation: 51612
You can try,
with(df, paste0('g', cumsum(replace(d, is.na(d), 0) > 2)+1))
#[1] "g1" "g1" "g1" "g1" "g1" "g2" "g2" "g2"
Upvotes: 2