Reputation: 379
I've got a bunch of columns all starting with the prefix wtp_
that occur in the midst of a wide dataframe (with several columns before and after the wtp_
columns). Mini example:
df <- tribble(~id, ~complete, ~wtp_20,~wtp_40,~wtp_60,~wtp_80,~wtp_100, ~sex,
1, 1, 0,0,1,1,1, "F",
2, 0, 0,0,0,1,1, "F",
3, 0, 0,0,0,0,1, "M",
4, 1, 1,1,1,1,1, "M",
5, 1, 0,0,0,0,0, "M",
6, 0, 0,1,1,1,1, "F"); df
What I'm looking for: I need to create a new variable (min_wtp
) that returns the name of the column the first time that one of the wtp_
columns switches from 0 to 1. In other words, I need a solution to create the following:
df_needed <- tribble(~id, ~complete, ~wtp_20,~wtp_40,~wtp_60,~wtp_80,~wtp_100, ~sex, ~min_wtp,
1, 1, 0,0,1,1,1, "F", "wtp_60",
2, 0, 0,0,0,1,1, "F", "wtp_80",
3, 0, 0,0,0,0,1, "M", "wtp_100",
4, 1, 1,1,1,1,1, "M", "wtp_20",
5, 1, 0,0,0,0,0, "M", "NA",
6, 0, 0,1,1,1,1, "F", "wtp_40"); df_needed
Please note the following complications:
-Some people (like id==5) never change to 1 while others (like id==4) are 1 all along.
-There are some irrelevant columns occurring before the wtp_
columns that have 0s and 1s in them which should be ignored in the construction of min_wtp
.
-There are way more columns (including wtp_
columns) than the minimal example I included above.
I've tried playing with which
and colnames
functions in combination with select(starts_with("wtp_"))
but have been unsuccessful.
If anyone has a dplyr solution, that would be preferred.
Upvotes: 1
Views: 696
Reputation: 93908
If it never goes backwards from 1 to 0, then you can find the change point very quickly with some basic sums:
sw <- startsWith(names(df), "wtp_")
names(df[sw])[sum(sw) - rowSums(df[sw]) + 1]
#[1] "wtp_60" "wtp_80" "wtp_100" "wtp_20" NA "wtp_40"
Upvotes: 1
Reputation: 389135
It would be much easier if you get the data in long format :
library(dplyr)
df %>%
tidyr::pivot_longer(cols = starts_with('wtp')) %>%
group_by(id) %>%
summarise(min_wtp = name[which(value == 1 &
lag(value, default = 0) == 0)[1]]) %>%
left_join(df, by = 'id')
# A tibble: 6 x 9
# id min_wtp complete wtp_20 wtp_40 wtp_60 wtp_80 wtp_100 sex
# <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#1 1 wtp_60 1 0 0 1 1 1 F
#2 2 wtp_80 0 0 0 0 1 1 F
#3 3 wtp_100 0 0 0 0 0 1 M
#4 4 wtp_20 1 1 1 1 1 1 M
#5 5 NA 1 0 0 0 0 0 M
#6 6 wtp_40 0 0 1 1 1 1 F
Without reshaping the data you can use rowwise
with c_across
:
apply_fun <- function(x) {
which(x == 1 & lag(x, default = 0) == 0)[1]
}
cols <- grep('^wtp', names(df), value = TRUE)
df %>%
rowwise() %>%
mutate(min_wtp = cols[apply_fun(c_across(cols))])
Upvotes: 2
Reputation: 93851
We can use apply
to get, for each row, the number of first column that satisfies your conditions. Then we use that number as the index to get the column name.
df$min_wtp = apply(df[ , grepl("wtp", names(df))], 1, function(x) {
names(x)[min(which(x > 0))]
})
df
id complete wtp_20 wtp_40 wtp_60 wtp_80 wtp_100 sex min_wtp <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> 1 1 1 0 0 1 1 1 F wtp_60 2 2 0 0 0 0 1 1 F wtp_80 3 3 0 0 0 0 0 1 M wtp_100 4 4 1 1 1 1 1 1 M wtp_20 5 5 1 0 0 0 0 0 M NA 6 6 0 0 1 1 1 1 F wtp_40
Upvotes: 2