Reputation: 369
I am new to R and have a question regarding the adding of a new variable to a table. I have data sequences starting with 10 and ending with 20, which appear several times.
Is there a way to group these sequences continuously?
Example:
The data in the column looks like that
10 3 15 15 19 19 20 20 10 10 11 17 20 ...
I would like to have an output like that
10 group 1
3 group 1
15 group 1
15 group 1
19 group 1
19 group 1
20 group 1
20 group 1
10 group 2
10 group 2
11 group 2
17 group 2
20 group 2
...
Is it possible to program something like that?
Thanks a lot for your help!!
Upvotes: 3
Views: 65
Reputation: 83275
Using base R you can detect the sequences and create a grouping variable with cumsum
and head
:
df$grp <- cumsum(df$x == 10 & c(20, head(df$x, -1)) == 20)
gives:
> df x grp 1: 10 1 2: 3 1 3: 15 1 4: 15 1 5: 19 1 6: 19 1 7: 20 1 8: 20 1 9: 10 2 10: 10 2 11: 11 2 12: 17 2 13: 20 2
What this does:
df$x == 10
detects the 10
'sc(20, head(df$x, -1)) == 20
detects whether the previous value is equal to 20
, the first value is set to 20
because there is preceding value for the first value of df$x
&
you get a logical value indicating which values in df$
are equel to 10
and for which the preceding value is also equal to 20
.cumsum
you get a grouping value.Or with data.table
:
library(data.table)
setDT(df)[, grp := cumsum(x == 10 & c(0, head(x, -1)) == 20)][]
Or with dplyr
:
library(dplyr)
df %>%
mutate(grp = cumsum(x == 10 & lag(x, default = 20) == 20))
You can use paste
/paste0
to add text to the group-label:
paste0('group_', cumsum(df$x == 10 & c(20, head(df$x, -1)) == 20))
Used data:
df <- data.frame(x = c(10, 3, 15, 15, 19, 19, 20, 20, 10, 10, 11, 17, 20))
Upvotes: 2
Reputation: 907
Try this. x is your numerics and y will be your groups.
x<-0:20
y<-NA
df1<-data.frame(x,y)
group1<-(x>10)
group2<-(x<=10)
df1$y[group1]<-"Group1"
df1$y[group2]<-"Group2"
df1
Upvotes: 0