Reputation: 1285
I have a dataset of this style:
id1 id2 start_line end_line content
A B 1 1 "aaaa"
A B 4 4 "aa mm"
A B 5 5 "boool"
A B 6 6 "omw"
C D 6 6 "hear!"
C D 7 7 " me out!"
C D 21 21 "hello"
I need to mutate this several times, with specific criteria. In particular, rows that have the same id1
, same id2
and consecutive start_line
:
start_line
needs to be changed to be the first one in the groupend_line
value needs to change to the last rowreal_line
needs to have the original start_line
cid
line with a numeric ID calculated by group of id1
, id2
, start_line
, end_line
So, the expected result would be:
id1 id2 start_line end_line content real_line cid
A B 1 1 "aaaa" 1 1
A B 4 6 "aa mm" 4 2
A B 4 6 "boool" 5 2
A B 4 6 "omw" 6 2
C D 6 7 "hear!" 6 3
C D 6 7 " me out!" 7 3
C D 21 21 "hello" 21 4
I can add real_line
by simply copying the original column, but I don't know how to replace start_line
and end_line
without summarising.
Upvotes: 1
Views: 827
Reputation: 887891
We group by 'id1', 'id2', then create the 'cid' based on the
library(dplyr)
df %>%
group_by(id1, id2) %>%
group_by(grp = cumsum(c(TRUE, diff(start_line) != 1)),
.add = TRUE) %>%
mutate(real_line = start_line,
start_line = first(start_line), end_line = last(end_line)) %>%
mutate(cid = cur_group_id()) %>%
ungroup %>%
select(-grp)
-output
# A tibble: 7 x 7
# id1 id2 start_line end_line content cid real_line
# <chr> <chr> <int> <int> <chr> <int> <int>
#1 A B 1 1 "aaaa" 1 1
#2 A B 4 6 "aa mm" 2 4
#3 A B 4 6 "boool" 2 5
#4 A B 4 6 "omw" 2 6
#5 C D 6 7 "hear!" 3 6
#6 C D 6 7 " me out!" 3 7
#7 C D 21 21 "hello" 4 21
Upvotes: 1
Reputation: 1285
Okay, the problem was that I wasn't ungrouping.
So based on R - Concatenate cell in dataframe, by group, depending on another cell value
I did:
mydf$real_line <- mydf$start_line
mydf %>%
group_by(id1, id2, grp = cumsum(c(TRUE, diff(start_line) > 1))) %>%
mutate(start_line = first(start_line), end_line = last(end_line)) %>%
ungroup()
mydf$grp <- NULL
And this generated the result I needed, but without the ID per group.
Upvotes: 1