AlienDeg
AlienDeg

Reputation: 1329

Count row if condition is met

I feel like I missed something, but I can't find an answer for it. I'm trying to define group_id where group starts when condition is matched. In this case condition is valuex doesn't equal valuey. This can be easily achived with window functions in SQL but I have no clue what to do in R

here is the input value.

Name|valuex|valuey
A   |     X|     X
A   |     X|     X
A   |     X|     X
A   |     X|     Z
A   |     Z|     Z
A   |     Z|     Z
A   |     Z|     Z
A   |     Z|     Y

expected output

Name|valuex|valuey|group
A   |     X|     X|0
A   |     X|     X|0
A   |     X|     X|0
A   |     X|     Z|1
A   |     Z|     Z|1
A   |     Z|     Z|1
A   |     Z|     Z|1
A   |     Z|     Y|2

Upvotes: 1

Views: 77

Answers (3)

akrun
akrun

Reputation: 886968

An option with dplyr

library(dplyr)
dat %>%
     group_by(Name) %>%
     mutate(group = cumsum(valuex != valuey))
# A tibble: 8 x 4
# Groups:   Name [1]
#  Name  valuex valuey group
#  <chr> <chr>  <chr>  <int>
#1 A     X      X          0
#2 A     X      X          0
#3 A     X      X          0
#4 A     X      Z          1
#5 A     Z      Z          1
#6 A     Z      Z          1
#7 A     Z      Z          1
#8 A     Z      Y          2

data

dat <- structure(list(Name = c("A", "A", "A", "A", "A", "A", "A", "A"
), valuex = c("X", "X", "X", "X", "Z", "Z", "Z", "Z"), valuey = c("X", 
"X", "X", "Z", "Z", "Z", "Z", "Y")), class = "data.frame", row.names = c(NA, 
-8L))

Upvotes: 1

ThomasIsCoding
ThomasIsCoding

Reputation: 101139

Here is another base R solution using findInterval, e.g.,

dfout <- within(df,group <- findInterval(seq(nrow(df)),which(valuex!=valuey)))

which gives

> dfout
  Name valuex valuey group
1    A      X      X     0
2    A      X      X     0
3    A      X      X     0
4    A      X      Z     1
5    A      Z      Z     1
6    A      Z      Z     1
7    A      Z      Z     1
8    A      Z      Y     2

Data

df <- structure(list(Name = c("A", "A", "A", "A", "A", "A", "A", "A"
), valuex = c("X", "X", "X", "X", "Z", "Z", "Z", "Z"), valuey = c("X", 
"X", "X", "Z", "Z", "Z", "Z", "Y")), class = "data.frame", row.names = c(NA,
-8L))

Upvotes: 1

r2evans
r2evans

Reputation: 160407

Base R:

dat$group <- cumsum(dat$valuex != dat$valuey)
dat
#   Name valuex valuey group
# 1    A      X      X     0
# 2    A      X      X     0
# 3    A      X      X     0
# 4    A      X      Z     1
# 5    A      Z      Z     1
# 6    A      Z      Z     1
# 7    A      Z      Z     1
# 8    A      Z      Y     2

Data:

dat <- structure(list(Name = c("A", "A", "A", "A", "A", "A", "A", "A"
), valuex = c("X", "X", "X", "X", "Z", "Z", "Z", "Z"), valuey = c("X", 
"X", "X", "Z", "Z", "Z", "Z", "Y")), class = "data.frame", row.names = c(NA, 
-8L))

Upvotes: 5

Related Questions