Reputation: 153
Let's say I have this data frame in R:
x <-data.frame(group=c("A","A","A", "B","B", "C","C","C"), cond=c("g1", "g2", "g1", "g3", "g2", "g2", "g1", "g2"), order=c(1,2,3,2,1,2,3,1))
group cond order
A g1 1
A g2 2
A g1 3
B g3 2
B g2 1
C g2 2
C g1 3
C g2 1
I want order groups according to the order variable to make two variables: v1) yes/no whether cond=="g1" for any of the previous rows in a group, and v2) yes/no whether, for a given row, there is a previous row in the group with a different value for cond (v2=no when group=C and order=1 or 2 but v2=yes when group=C and order=3). I know I can use dplyr to arrange the groups:
x %>% group_by(group) %>% arrange(order, .by_group=T)
but I don't know where to go from there since lag()
and lead()
are only for the row immediately before or after.
Upvotes: 3
Views: 2255
Reputation: 153
Here is a method that works but is definitely not the most efficient:
x <-data.frame(group=c("A","A","A", "B","B", "C","C","C"),
cond=c("g1", "g2", "g1", "g3", "g2", "g2", "g1", "g2"),
order=c(1,2,3,2,1,2,3,1))
x<-x %>% group_by(group) %>%
mutate(cond_1=lag(cond),
cond_2=ifelse(order>2, lag(cond_1), NA),
cond_2=ifelse(order>3, lag(cond_1), NA)
) %>% ungroup()
#doublecheck there are enough lag variables
max_order<-max(x$order, na.rm = T)
lag_var<-sum(grepl("cond_", names(x)))
if (lag_var<(max_order-1)){
stop("Add more lag variables (cond_#) to make sure you get all previous conditions")
} else { #do nothing
}
x <- x %>%
mutate(v1=case_when(order !=1 ~ as.numeric(rowSums(
sapply(x %>% select(., starts_with("cond_")),
function(x) grepl("^g1$", x, ignore.case = TRUE) ))>0)),
v2=as.numeric(apply(x %>% select(., starts_with("cond")), 1,
function(x) length(unique(na.omit(x)))>1)))
Upvotes: 0
Reputation: 17069
cond
. (I use pivot_wider
for this.)cond
code > 0, it means that value has occurred in the current or a previous row. We can exclude the current row by using dplyr::lag
.cond
values have been seen so far. If this is >1, then we know a previous row must have had a different value from the current row.select
ing away the one-hot codes and other helper columns.library(dplyr)
library(tidyr)
x <-data.frame(
group=c("A","A","A", "B","B", "C","C","C"),
cond=c("g1", "g2", "g1", "g3", "g2", "g2", "g1", "g2"),
order=c(1,2,3,2,1,2,3,1)
)
x2 <- x %>%
mutate(cond_pivot = cond, cond_value = 1) %>%
pivot_wider(
names_from = cond_pivot,
values_from = cond_value,
values_fill = 0
) %>%
group_by(group) %>%
arrange(order, .by_group = TRUE) %>%
mutate(
across(g1:g3, ~ cumsum(.x) > 0, .names = "{.col}_seen_yet"),
n_seen_so_far = rowSums(across(g1_seen_yet:g3_seen_yet)),
v1 = dplyr::lag(g1_seen_yet, default = FALSE),
v2 = n_seen_so_far > 1
) %>%
select(group:order, v1:v2)
Output:
# A tibble: 8 x 5
# Groups: group [3]
group cond order v1 v2
<chr> <chr> <dbl> <lgl> <lgl>
1 A g1 1 FALSE FALSE
2 A g2 2 TRUE TRUE
3 A g1 3 TRUE TRUE
4 B g2 1 FALSE FALSE
5 B g3 2 FALSE TRUE
6 C g2 1 FALSE FALSE
7 C g2 2 FALSE FALSE
8 C g1 3 FALSE TRUE
Upvotes: 2