Filter using paste and name in dplyr

Question

Sample data

df <- data.frame(loc.id = rep(1:5, each = 6), day = sample(1:365,30), 
                       ref.day1 = rep(c(20,30,50,80,90), each = 6),
                       ref.day2 = rep(c(10,28,33,49,67), each = 6),
                       ref.day3 = rep(c(31,49,65,55,42), each = 6))

For each loc.id, if I want to keep days that are >= then ref.day1, I do this:

df %>% group_by(loc.id) %>% dplyr::filter(day >= ref.day1)

I want to make 3 data frames, each whose rows are filtered by ref.day1, ref.day2,ref.day3 respectively

I tried this:

col.names <- c("ref.day1","ref.day2","ref.day3")
temp.list <- list()
for(cl in seq_along(col.names)){

    col.sub <- col.names[cl] 
    columns <- c("loc.id","day",col.sub)

    df.sub <- df[,columns]
    temp.dat <- df.sub %>% group_by(loc.id) %>% dplyr::filter(day >= paste0(col.sub)) # this line does not work
    temp.list[[cl]] <- temp.dat
}

final.dat <- rbindlist(temp.list)

I was wondering how to refer to columns by names and paste function in dplyr in order to filter it out.

Yifu Yan · Accepted Answer

The reason why your original code doesn't work is that your col.names are strings, but dplyr function uses non-standard evaluation which doesn't accept strings. So you need to convert the string into variables.rlang::sym() can do that.
Also, you can use map function in purrr package, which is much more compact:

library(dplyr)
library(purrr)
col_names <- c("ref.day1","ref.day2","ref.day3")
map(col_names,~ df %>% dplyr::filter(day >= UQ(rlang::sym(.x))))
#it will return you a list of dataframes

By the way I removed group_by() because they don't seem to be useful.

Returned result:

    [[1]]
    loc.id day ref.day1 ref.day2 ref.day3
    1       1 362       20       10       31
    2       1  69       20       10       31
    3       1  65       20       10       31
    4       1  88       20       10       31
    5       1 142       20       10       31
    6       2 355       30       28       49
    7       2 255       30       28       49
    8       2 136       30       28       49
    9       2 156       30       28       49
    10      2 194       30       28       49
    11      2 204       30       28       49
    12      3 129       50       33       65
    13      3 254       50       33       65
    14      3 279       50       33       65
    15      3 201       50       33       65
    16      3 282       50       33       65
    17      4 351       80       49       55
    18      4 114       80       49       55
    19      4 338       80       49       55
    20      4 283       80       49       55
    21      5 199       90       67       42
    22      5 141       90       67       42
    23      5 241       90       67       42
    24      5 187       90       67       42

    [[2]]
    loc.id day ref.day1 ref.day2 ref.day3
    1       1  16       20       10       31
    2       1 362       20       10       31
    3       1  69       20       10       31
    4       1  65       20       10       31
    5       1  88       20       10       31
    6       1 142       20       10       31
    7       2 355       30       28       49
    8       2 255       30       28       49
    9       2 136       30       28       49
    10      2 156       30       28       49
    11      2 194       30       28       49
    12      2 204       30       28       49
    13      3 129       50       33       65
    14      3 254       50       33       65
    15      3 279       50       33       65
    16      3 201       50       33       65
    17      3 282       50       33       65
    18      4 351       80       49       55
    19      4 114       80       49       55
    20      4 338       80       49       55
    21      4 283       80       49       55
    22      4  79       80       49       55
    23      5 199       90       67       42
    24      5  67       90       67       42
    25      5 141       90       67       42
    26      5 241       90       67       42
    27      5 187       90       67       42

    [[3]]
    loc.id day ref.day1 ref.day2 ref.day3
    1       1 362       20       10       31
    2       1  69       20       10       31
    3       1  65       20       10       31
    4       1  88       20       10       31
    5       1 142       20       10       31
    6       2 355       30       28       49
    7       2 255       30       28       49
    8       2 136       30       28       49
    9       2 156       30       28       49
    10      2 194       30       28       49
    11      2 204       30       28       49
    12      3 129       50       33       65
    13      3 254       50       33       65
    14      3 279       50       33       65
    15      3 201       50       33       65
    16      3 282       50       33       65
    17      4 351       80       49       55
    18      4 114       80       49       55
    19      4 338       80       49       55
    20      4 283       80       49       55
    21      4  79       80       49       55
    22      5 199       90       67       42
    23      5  67       90       67       42
    24      5 141       90       67       42
    25      5 241       90       67       42
    26      5 187       90       67       42

You may also want to check these:
https://dplyr.tidyverse.org/articles/programming.html
Use variable names in functions of dplyr

Filter using paste and name in dplyr

Answers (1)

Related Questions