Reputation: 1160
With the dataset below:
id from to trip
1 1 A B <NA>
2 1 B C X
3 1 C D <NA>
4 1 D A X
5 2 B A X
6 2 A C <NA>
7 2 C D <NA>
I'm trying to split the data whenever I reach an X on the trip column, in order to obtain something like this(note that there's a new column):
id from to trip group
1 1 A B <NA> 1
2 1 B C X 1
3 1 C D <NA> 2
4 1 D A X 2
5 2 B A X 3
6 2 A C <NA> 4
7 2 C D <NA> 4
What this means is that it goes through the observations/rows, and based on the id and trip column, it starts a new number every time it reaches an X.
I've tried a couple of things with group_by(df, id, trip)
, but I always end up without a similar structure to that one.
Any suggestions?
Upvotes: 1
Views: 143
Reputation: 38500
This base R method works for the example:
df$group <- c(1, head(cumsum(df$trip == "X") + 1, -1))
In the example, it returns
df
id from to trip group
1 1 A B <NA> 1
2 1 B C X 1
3 1 C D <NA> 2
4 1 D A X 2
5 2 B A X 3
6 2 A C <NA> 4
7 2 C D <NA> 4
Also, in the case that the first observation is "X", then 1 will be the first element and 2 will be the second element, which I believe is what is desired.
As @manotheshark mentions in the commments, this solution will not work on NA values, because cumsum
will return NA after the first NA is encountered. The suggested solution would be to replace df$trip == "X"
with is.na(df$trip)
which will return the correct values in the presence of NAs.
Upvotes: 2
Reputation: 330
Try this (if you actually need to start new group on row after X i think you can achieve that with another shift() on the trip == "X":
library(data.table)
set.seed(1)
na.zero <- function (x) {
x[is.na(x)] <- 0
return(x)
}
dt <- data.table(id = c(1,1,1,2,2,2),
from = sample(c("A", "B", "C", "D"), 6, replace = T),
to = sample(c("A", "B", "C", "D"), 6, replace = T),
trip = sample(c("X", NA), 6, replace = T))
dt[, group:=(cumsum(na.zero(trip=="X"))+cumsum(id-na.zero(shift(id, 1L,type = 'lag'))))]
id from to trip group
1: 1 B D NA 1
2: 1 B C X 2
3: 1 C C NA 2
4: 2 D A X 4
5: 2 A A NA 4
6: 2 D A NA 4
Upvotes: 0