R: Replacing data by group in dataframe

Question

I have a dataset of this style:

id1  id2  start_line end_line content   
A    B    1          1        "aaaa" 
A    B    4          4        "aa mm" 
A    B    5          5        "boool"
A    B    6          6        "omw"   
C    D    6          6        "hear!" 
C    D    7          7        " me out!"
C    D    21         21       "hello"

I need to mutate this several times, with specific criteria. In particular, rows that have the same id1, same id2 and consecutive start_line:

The start_line needs to be changed to be the first one in the group
The end_line value needs to change to the last row
A new column real_line needs to have the original start_line
A new numeric cid line with a numeric ID calculated by group of id1, id2, start_line, end_line

So, the expected result would be:

id1  id2  start_line end_line content      real_line   cid
A    B    1          1        "aaaa"        1          1
A    B    4          6        "aa mm"       4          2
A    B    4          6        "boool"       5          2
A    B    4          6        "omw"         6          2
C    D    6          7        "hear!"       6          3
C    D    6          7        " me out!"    7          3
C    D    21         21       "hello"       21         4

I can add real_line by simply copying the original column, but I don't know how to replace start_line and end_line without summarising.

akrun · Accepted Answer

We group by 'id1', 'id2', then create the 'cid' based on the

library(dplyr)
df %>% 
     group_by(id1, id2) %>% 
     group_by(grp = cumsum(c(TRUE, diff(start_line)  != 1)), 
           .add = TRUE) %>% 
    mutate(real_line = start_line, 
       start_line = first(start_line), end_line = last(end_line)) %>%
    mutate(cid = cur_group_id()) %>%
    ungroup %>%
    select(-grp)

-output

# A tibble: 7 x 7
#  id1   id2   start_line end_line content      cid real_line
#                         
#1 A     B              1        1 "aaaa"         1         1
#2 A     B              4        6 "aa mm"        2         4
#3 A     B              4        6 "boool"        2         5
#4 A     B              4        6 "omw"          2         6
#5 C     D              6        7 "hear!"        3         6
#6 C     D              6        7 " me out!"     3         7
#7 C     D             21       21 "hello"        4        21

R: Replacing data by group in dataframe

Answers (2)

Related Questions