savi
savi

Reputation: 323

column-wise filtering of dataframe

I have the following df.

empid  cond1  cond2  cond3 cond4
234     1      0      1     1
7890    2      1      1     0
38476   1      1      0     1
456     0      1      1     0

The empid column is of type character and the rest are numeric columns. I need the list of empids for every cond1, cond2, cond3 , and cond4 that have values greater than 0. Right now, I'm creating a list of column names from 2:5 , using a for loop to iterate over the columns, select and then filter on that particular column.

I feel like it can be done more efficiently using lapply but I am not sure how to include columns to select since I only need empid column in my result.

As output, what would work is may be a new list object, containing values of empid filtered. So , for example, something like: new list object,

cond1 <- "234","7890","38476"

Thank you in advance.

Upvotes: 2

Views: 76

Answers (2)

tmfmnk
tmfmnk

Reputation: 39858

One dplyr option could be:

df %>%
 mutate_at(vars(starts_with("cond")), ~ list(empid[. != 0]))

  empid            cond1            cond2          cond3      cond4
1   234 234, 7890, 38476 7890, 38476, 456 234, 7890, 456 234, 38476
2  7890 234, 7890, 38476 7890, 38476, 456 234, 7890, 456 234, 38476
3 38476 234, 7890, 38476 7890, 38476, 456 234, 7890, 456 234, 38476
4   456 234, 7890, 38476 7890, 38476, 456 234, 7890, 456 234, 38476

If you are looking for storing it in a long format, with the addition of tidyr:

df %>%
 pivot_longer(-empid, names_to = "condition") %>%
 group_by(condition) %>%
 summarise(empid = list(empid[value != 0]))

  condition empid    
  <chr>     <list>   
1 cond1     <int [3]>
2 cond2     <int [3]>
3 cond3     <int [3]>
4 cond4     <int [2]>

Upvotes: 1

Sotos
Sotos

Reputation: 51582

Here is a solution via base R,

aggregate(empid ~ ind, subset(cbind(empid = df$empid, stack(df[-1])), values > 0), list)

#    ind            empid
#1 cond1 234, 7890, 38476
#2 cond2 7890, 38476, 456
#3 cond3   234, 7890, 456
#4 cond4       234, 38476

NOTE:

If we keep empid as a character, it will confuse the stack with factor. To avoid that, we can add as.integer() in the subset, i.e.

aggregate(empid ~ ind, subset(cbind(empid = as.integer(df$empid), stack(df[-1])), values > 0), list)

Upvotes: 2

Related Questions