Reputation: 323
I have the following df.
empid cond1 cond2 cond3 cond4
234 1 0 1 1
7890 2 1 1 0
38476 1 1 0 1
456 0 1 1 0
The empid column is of type character and the rest are numeric columns. I need the list of empids for every cond1, cond2, cond3 , and cond4 that have values greater than 0. Right now, I'm creating a list of column names from 2:5 , using a for loop to iterate over the columns, select and then filter on that particular column.
I feel like it can be done more efficiently using lapply but I am not sure how to include columns to select since I only need empid column in my result.
As output, what would work is may be a new list object, containing values of empid filtered. So , for example, something like: new list object,
cond1 <- "234","7890","38476"
Thank you in advance.
Upvotes: 2
Views: 76
Reputation: 39858
One dplyr
option could be:
df %>%
mutate_at(vars(starts_with("cond")), ~ list(empid[. != 0]))
empid cond1 cond2 cond3 cond4
1 234 234, 7890, 38476 7890, 38476, 456 234, 7890, 456 234, 38476
2 7890 234, 7890, 38476 7890, 38476, 456 234, 7890, 456 234, 38476
3 38476 234, 7890, 38476 7890, 38476, 456 234, 7890, 456 234, 38476
4 456 234, 7890, 38476 7890, 38476, 456 234, 7890, 456 234, 38476
If you are looking for storing it in a long format, with the addition of tidyr
:
df %>%
pivot_longer(-empid, names_to = "condition") %>%
group_by(condition) %>%
summarise(empid = list(empid[value != 0]))
condition empid
<chr> <list>
1 cond1 <int [3]>
2 cond2 <int [3]>
3 cond3 <int [3]>
4 cond4 <int [2]>
Upvotes: 1
Reputation: 51582
Here is a solution via base R,
aggregate(empid ~ ind, subset(cbind(empid = df$empid, stack(df[-1])), values > 0), list)
# ind empid
#1 cond1 234, 7890, 38476
#2 cond2 7890, 38476, 456
#3 cond3 234, 7890, 456
#4 cond4 234, 38476
NOTE:
If we keep empid
as a character, it will confuse the stack
with factor. To avoid that, we can add as.integer()
in the subset
, i.e.
aggregate(empid ~ ind, subset(cbind(empid = as.integer(df$empid), stack(df[-1])), values > 0), list)
Upvotes: 2