Jael
Jael

Reputation: 369

Categorize data sequences continously

I am new to R and have a question regarding the adding of a new variable to a table. I have data sequences starting with 10 and ending with 20, which appear several times.

Is there a way to group these sequences continuously?

Example:

The data in the column looks like that

10 3 15 15 19 19 20 20 10 10 11 17 20  ...

I would like to have an output like that

10 group 1
3  group 1
15 group 1
15 group 1
19 group 1
19 group 1
20 group 1
20 group 1
10 group 2
10 group 2
11 group 2
17 group 2
20 group 2
...

Is it possible to program something like that?

Thanks a lot for your help!!

Upvotes: 3

Views: 65

Answers (2)

Jaap
Jaap

Reputation: 83275

Using base R you can detect the sequences and create a grouping variable with cumsum and head:

df$grp <- cumsum(df$x == 10 & c(20, head(df$x, -1)) == 20)

gives:

> df
     x grp
 1: 10   1
 2:  3   1
 3: 15   1
 4: 15   1
 5: 19   1
 6: 19   1
 7: 20   1
 8: 20   1
 9: 10   2
10: 10   2
11: 11   2
12: 17   2
13: 20   2

What this does:

  • df$x == 10 detects the 10's
  • c(20, head(df$x, -1)) == 20 detects whether the previous value is equal to 20, the first value is set to 20 because there is preceding value for the first value of df$x
  • By combining these two with & you get a logical value indicating which values in df$ are equel to 10 and for which the preceding value is also equal to 20.
  • Wrapping that in cumsum you get a grouping value.

Or with data.table:

library(data.table)
setDT(df)[, grp := cumsum(x == 10 & c(0, head(x, -1)) == 20)][]

Or with dplyr:

library(dplyr)
df %>% 
  mutate(grp = cumsum(x == 10 & lag(x, default = 20) == 20))

You can use paste/paste0 to add text to the group-label:

paste0('group_', cumsum(df$x == 10 & c(20, head(df$x, -1)) == 20))

Used data:

df <- data.frame(x = c(10, 3, 15, 15, 19, 19, 20, 20, 10, 10, 11, 17, 20))

Upvotes: 2

antR
antR

Reputation: 907

Try this. x is your numerics and y will be your groups.

x<-0:20
y<-NA
df1<-data.frame(x,y)
group1<-(x>10)
group2<-(x<=10)
df1$y[group1]<-"Group1"
df1$y[group2]<-"Group2"
df1

Upvotes: 0

Related Questions