Reputation: 145
I want to add "seq" column as numbering by group which is organized with BSTN, ASTN, O and D. This work is hard to me to classify as another group within the group. A group is created whenever BSTN and O are equal, and ASTN and D are equal.
BSTN ASTN O D
150 202 150 151
150 202 151 201
150 202 201 202
150 202 150 426
150 202 426 423
150 202 423 321
150 202 321 320
150 202 320 203
150 202 203 202
150 403 150 426
150 403 426 423
150 403 423 321
150 403 321 320
150 403 320 403
150 403 150 151
150 403 151 201
150 403 201 403
150 403 150 426
150 403 426 422
150 403 422 205
150 403 205 403
in case of BSTN 150 to ASTN 202, it has two groups, since there are two 150s and 202s in O and D respectively. The "seq" column what I want to is like this.
BSTN ASTN O D seq
150 202 150 151 1
150 202 151 201 1
150 202 201 202 1
150 202 150 426 2
150 202 426 423 2
150 202 423 321 2
150 202 321 320 2
150 202 320 203 2
150 202 203 202 2
150 403 150 426 1
150 403 426 423 1
150 403 423 321 1
150 403 321 320 1
150 403 320 403 1
150 403 150 151 2
150 403 151 201 2
150 403 201 403 2
150 403 150 426 3
150 403 426 422 3
150 403 422 205 3
150 403 205 403 3
please help me!!
Upvotes: 1
Views: 113
Reputation: 35554
I attempt another way with tapply()
to make groups.
attach(df)
tapply(O, list(BSTN, ASTN), function(x) cumsum(x == x[1]) ) %>% unlist
[1] 1 1 1 2 2 2 2 2 2 1 1 1 1 1 2 2 2 3 3 3 3 1 1 1 1 2 2 2 2 2 2 2 1 1 1 1 1
[38] 1 1 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 1 1 1 1 2 2 2 2 1 1 1 1 2 2 2
[75] 2 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 4 4 4 4 1 1 1 1
It gets the same output as the first answer. Hope to help you.
Upvotes: 2
Reputation: 145755
Here's a dplyr
solution:
library(dplyr)
df %>% group_by(BSTN, ASTN) %>%
mutate(g = lag(D) != O,
g = if_else(is.na(g), FALSE, g),
seq = cumsum(g) + 1)
# A tibble: 21 x 6
# Groups: BSTN, ASTN [2]
BSTN ASTN O D g seq
<int> <int> <int> <int> <lgl> <dbl>
1 150 202 150 151 FALSE 1
2 150 202 151 201 FALSE 1
3 150 202 201 202 FALSE 1
4 150 202 150 426 TRUE 2
5 150 202 426 423 FALSE 2
6 150 202 423 321 FALSE 2
7 150 202 321 320 FALSE 2
8 150 202 320 203 FALSE 2
9 150 202 203 202 FALSE 2
10 150 403 150 426 FALSE 1
# ... with 11 more rows
You can, of course, remove the g
column.
Using this data:
df = read.table(text = "BSTN ASTN O D
150 202 150 151
150 202 151 201
150 202 201 202
150 202 150 426
150 202 426 423
150 202 423 321
150 202 321 320
150 202 320 203
150 202 203 202
150 403 150 426
150 403 426 423
150 403 423 321
150 403 321 320
150 403 320 403
150 403 150 151
150 403 151 201
150 403 201 403
150 403 150 426
150 403 426 422
150 403 422 205
150 403 205 403", header = T)
Upvotes: 5