Reputation: 1329
I feel like I missed something, but I can't find an answer for it. I'm trying to define group_id where group starts when condition is matched. In this case condition is valuex doesn't equal valuey. This can be easily achived with window functions in SQL but I have no clue what to do in R
here is the input value.
Name|valuex|valuey
A | X| X
A | X| X
A | X| X
A | X| Z
A | Z| Z
A | Z| Z
A | Z| Z
A | Z| Y
expected output
Name|valuex|valuey|group
A | X| X|0
A | X| X|0
A | X| X|0
A | X| Z|1
A | Z| Z|1
A | Z| Z|1
A | Z| Z|1
A | Z| Y|2
Upvotes: 1
Views: 77
Reputation: 886968
An option with dplyr
library(dplyr)
dat %>%
group_by(Name) %>%
mutate(group = cumsum(valuex != valuey))
# A tibble: 8 x 4
# Groups: Name [1]
# Name valuex valuey group
# <chr> <chr> <chr> <int>
#1 A X X 0
#2 A X X 0
#3 A X X 0
#4 A X Z 1
#5 A Z Z 1
#6 A Z Z 1
#7 A Z Z 1
#8 A Z Y 2
dat <- structure(list(Name = c("A", "A", "A", "A", "A", "A", "A", "A"
), valuex = c("X", "X", "X", "X", "Z", "Z", "Z", "Z"), valuey = c("X",
"X", "X", "Z", "Z", "Z", "Z", "Y")), class = "data.frame", row.names = c(NA,
-8L))
Upvotes: 1
Reputation: 101139
Here is another base R solution using findInterval
, e.g.,
dfout <- within(df,group <- findInterval(seq(nrow(df)),which(valuex!=valuey)))
which gives
> dfout
Name valuex valuey group
1 A X X 0
2 A X X 0
3 A X X 0
4 A X Z 1
5 A Z Z 1
6 A Z Z 1
7 A Z Z 1
8 A Z Y 2
Data
df <- structure(list(Name = c("A", "A", "A", "A", "A", "A", "A", "A"
), valuex = c("X", "X", "X", "X", "Z", "Z", "Z", "Z"), valuey = c("X",
"X", "X", "Z", "Z", "Z", "Z", "Y")), class = "data.frame", row.names = c(NA,
-8L))
Upvotes: 1
Reputation: 160407
Base R:
dat$group <- cumsum(dat$valuex != dat$valuey)
dat
# Name valuex valuey group
# 1 A X X 0
# 2 A X X 0
# 3 A X X 0
# 4 A X Z 1
# 5 A Z Z 1
# 6 A Z Z 1
# 7 A Z Z 1
# 8 A Z Y 2
Data:
dat <- structure(list(Name = c("A", "A", "A", "A", "A", "A", "A", "A"
), valuex = c("X", "X", "X", "X", "Z", "Z", "Z", "Z"), valuey = c("X",
"X", "X", "Z", "Z", "Z", "Z", "Y")), class = "data.frame", row.names = c(NA,
-8L))
Upvotes: 5