Reputation: 21400
I have a dataframe where one column indicates text line
s, which are only partially consecutive:
df <- data.frame(
line = c("0001","0002", "0003", "0011","0012","0234","0235","0236")
)
I want to group the rows based on consecutive line numbers to get this expected result:
df
line grp
1 0001 1
2 0002 1
3 0003 1
4 0011 2
5 0012 2
6 0234 3
7 0235 3
8 0236 3
I've tried to approach this with dplyr
's lag
function but am stuck there:
library(dplyr)
df %>%
mutate(line = as.numeric(line),
diff = abs(lag(line) - line))
Upvotes: 1
Views: 289
Reputation: 11584
Does this work:
library(dplyr)
library(stringr)
library(data.table)
df %>% mutate(z = str_count(line, '0'), grp = rleid(z)) %>% select(-z)
line grp
1 0001 1
2 0002 1
3 0003 1
4 0011 2
5 0012 2
6 0234 3
7 0235 3
8 0236 3
Upvotes: 1
Reputation: 388817
Convert the numbers to numeric, calculate difference between consecutive numbers and increment the group count when the difference is greater than 1.
transform(df, group = cumsum(c(TRUE, diff(as.numeric(line)) > 1)))
# line group
#1 0001 1
#2 0002 1
#3 0003 1
#4 0011 2
#5 0012 2
#6 0234 3
#7 0235 3
#8 0236 3
If you want to use dplyr
:
library(dplyr)
df %>% mutate(group = cumsum(c(TRUE, diff(as.numeric(line)) > 1)))
Upvotes: 3