Reputation: 3805
Sample data
dat <-
data.frame(Sim.Y1 = rnorm(10), Sim.Y2 = rnorm(10),
Sim.Y3 = rnorm(10), obsY = rnorm(10),
ID = sample(1:10, 10), ID_s = rep(1:2, each = 5))
For the following vector, I want to calculate the mean across ID_s
simVec <- c('Sim.Y1.cor','Sim.Y2.cor')
for(s in simVec){
simRef <- simVec[s]
simID <- unlist(strsplit(simRef, split = '.cor',fixed = T))[1]
# this works
dat %>% dplyr::group_by(ID_s) %>%
dplyr::summarise(meanMod = mean(Sim.Y1))
# this doesn't work
dat %>% dplyr::group_by(ID_s) %>%
dplyr::summarise(meanMod = mean(!!(simID)))
}
How do I refer a column in dplyr not by its explicit name?
Upvotes: 0
Views: 67
Reputation: 13691
Note that your particular task can be performed without any non-standard evaluation by using summarize_at()
, which works directly with strings:
simIDs <- stringr::str_split(simVec, ".cor") %>% purrr::map_chr(1)
# [1] "Sim.Y1" "Sim.Y2"
dat %>% dplyr::group_by(ID_s) %>% dplyr::summarise_at(simIDs, mean)
# # A tibble: 2 x 3
# ID_s Sim.Y1 Sim.Y2
# <int> <dbl> <dbl>
# 1 1 0.494 -0.0522
# 2 2 -0.104 -0.370
A custom suffix can also be supplied through the named list:
dat %>% dplyr::group_by(ID_s) %>% dplyr::summarise_at(simIDs, list(m=mean))
# # A tibble: 2 x 3
# ID_s Sim.Y1_m Sim.Y2_m <--- Note the _m suffix
# <int> <dbl> <dbl>
# 1 1 0.494 -0.0522
# 2 2 -0.104 -0.370
Upvotes: 2
Reputation: 616
I understand the question to be, how do you get a column without referencing the column name, i.e. using the index instead.
Let me know if my understanding is incorrect.
If not, I believe the easiest way would be as per below.
> df1 <- data.frame(ID_s=c('a','b','c'),Val=c('a1','b1','c1'))
> df1
ID_s Val
1 a a1
2 b b1
3 c c1
> df1[,1]
[1] a b c
Levels: a b c
If you want to save that as a dataframe, can be extended as per below:
cc <- data.frame(ID_s=df1[,1])
Hope this helps!
Upvotes: 0
Reputation: 6956
First, you have to use seq_along()
if you want to index you vector with s
.
Second, you are missing sym()
.
This should work:
simVec <- c('Sim.Y1.cor','Sim.Y3.cor')
for(s in seq_along(simVec)){
simRef <- simVec[s]
simID <- unlist(strsplit(simRef, split = '.cor',fixed = T))[1]
# this works
dat %>% dplyr::group_by(ID_s) %>%
dplyr::summarise(meanMod = mean(Sim.Y1))
# this doesn't work
dat %>% dplyr::group_by(ID_s) %>%
dplyr::summarise(meanMod = mean(!!sym(simID)))
}
edit: no Typo
Upvotes: 1
Reputation: 31
library(dplyr)
dat %>% group_by(ID) %>%
summarise(mean_y1 =mean(Sim.Y1),
mean_y2 =mean(Sim.Y2),
mean_y3 =mean(Sim.Y3),
mean_obsY = mean(obsY))
Upvotes: 0