Reputation: 3805
Sample data
df <- data.frame(loc.id = rep(1:5, each = 6), day = sample(1:365,30),
ref.day1 = rep(c(20,30,50,80,90), each = 6),
ref.day2 = rep(c(10,28,33,49,67), each = 6),
ref.day3 = rep(c(31,49,65,55,42), each = 6))
For each loc.id, if I want to keep days that are >= then ref.day1, I do this:
df %>% group_by(loc.id) %>% dplyr::filter(day >= ref.day1)
I want to make 3 data frames, each whose rows are filtered by ref.day1
, ref.day2
,ref.day3
respectively
I tried this:
col.names <- c("ref.day1","ref.day2","ref.day3")
temp.list <- list()
for(cl in seq_along(col.names)){
col.sub <- col.names[cl]
columns <- c("loc.id","day",col.sub)
df.sub <- df[,columns]
temp.dat <- df.sub %>% group_by(loc.id) %>% dplyr::filter(day >= paste0(col.sub)) # this line does not work
temp.list[[cl]] <- temp.dat
}
final.dat <- rbindlist(temp.list)
I was wondering how to refer to columns by names and paste function in dplyr in order to filter it out.
Upvotes: 1
Views: 2361
Reputation: 6106
The reason why your original code doesn't work is that your col.names are strings, but dplyr function uses non-standard evaluation which doesn't accept strings. So you need to convert the string into variables.rlang::sym()
can do that.
Also, you can use map function in purrr package, which is much more compact:
library(dplyr)
library(purrr)
col_names <- c("ref.day1","ref.day2","ref.day3")
map(col_names,~ df %>% dplyr::filter(day >= UQ(rlang::sym(.x))))
#it will return you a list of dataframes
By the way I removed group_by()
because they don't seem to be useful.
Returned result:
[[1]]
loc.id day ref.day1 ref.day2 ref.day3
1 1 362 20 10 31
2 1 69 20 10 31
3 1 65 20 10 31
4 1 88 20 10 31
5 1 142 20 10 31
6 2 355 30 28 49
7 2 255 30 28 49
8 2 136 30 28 49
9 2 156 30 28 49
10 2 194 30 28 49
11 2 204 30 28 49
12 3 129 50 33 65
13 3 254 50 33 65
14 3 279 50 33 65
15 3 201 50 33 65
16 3 282 50 33 65
17 4 351 80 49 55
18 4 114 80 49 55
19 4 338 80 49 55
20 4 283 80 49 55
21 5 199 90 67 42
22 5 141 90 67 42
23 5 241 90 67 42
24 5 187 90 67 42
[[2]]
loc.id day ref.day1 ref.day2 ref.day3
1 1 16 20 10 31
2 1 362 20 10 31
3 1 69 20 10 31
4 1 65 20 10 31
5 1 88 20 10 31
6 1 142 20 10 31
7 2 355 30 28 49
8 2 255 30 28 49
9 2 136 30 28 49
10 2 156 30 28 49
11 2 194 30 28 49
12 2 204 30 28 49
13 3 129 50 33 65
14 3 254 50 33 65
15 3 279 50 33 65
16 3 201 50 33 65
17 3 282 50 33 65
18 4 351 80 49 55
19 4 114 80 49 55
20 4 338 80 49 55
21 4 283 80 49 55
22 4 79 80 49 55
23 5 199 90 67 42
24 5 67 90 67 42
25 5 141 90 67 42
26 5 241 90 67 42
27 5 187 90 67 42
[[3]]
loc.id day ref.day1 ref.day2 ref.day3
1 1 362 20 10 31
2 1 69 20 10 31
3 1 65 20 10 31
4 1 88 20 10 31
5 1 142 20 10 31
6 2 355 30 28 49
7 2 255 30 28 49
8 2 136 30 28 49
9 2 156 30 28 49
10 2 194 30 28 49
11 2 204 30 28 49
12 3 129 50 33 65
13 3 254 50 33 65
14 3 279 50 33 65
15 3 201 50 33 65
16 3 282 50 33 65
17 4 351 80 49 55
18 4 114 80 49 55
19 4 338 80 49 55
20 4 283 80 49 55
21 4 79 80 49 55
22 5 199 90 67 42
23 5 67 90 67 42
24 5 141 90 67 42
25 5 241 90 67 42
26 5 187 90 67 42
You may also want to check these:
https://dplyr.tidyverse.org/articles/programming.html
Use variable names in functions of dplyr
Upvotes: 1