Reputation: 3805

Refer a column by variable name

Sample data

dat <- 
      data.frame(Sim.Y1 = rnorm(10), Sim.Y2 = rnorm(10),
                 Sim.Y3 = rnorm(10), obsY = rnorm(10),
                 ID = sample(1:10, 10), ID_s = rep(1:2, each = 5))

For the following vector, I want to calculate the mean across ID_s

simVec <- c('Sim.Y1.cor','Sim.Y2.cor')

for(s in simVec){

 simRef <- simVec[s]
 simID <- unlist(strsplit(simRef, split = '.cor',fixed = T))[1]   

 # this works  
 dat %>% dplyr::group_by(ID_s) %>%
 dplyr::summarise(meanMod = mean(Sim.Y1))

# this doesn't work
 dat %>% dplyr::group_by(ID_s) %>%
 dplyr::summarise(meanMod = mean(!!(simID)))
 }

How do I refer a column in dplyr not by its explicit name?

Upvotes: 0

Answers (4)

Artem Sokolov

Reputation: 13691

Note that your particular task can be performed without any non-standard evaluation by using summarize_at(), which works directly with strings:

simIDs <- stringr::str_split(simVec, ".cor") %>% purrr::map_chr(1)
# [1] "Sim.Y1" "Sim.Y2"

dat %>% dplyr::group_by(ID_s) %>% dplyr::summarise_at(simIDs, mean)
# # A tibble: 2 x 3
#    ID_s Sim.Y1  Sim.Y2
#   <int>  <dbl>   <dbl>
# 1     1  0.494 -0.0522
# 2     2 -0.104 -0.370

A custom suffix can also be supplied through the named list:

dat %>% dplyr::group_by(ID_s) %>% dplyr::summarise_at(simIDs, list(m=mean))
# # A tibble: 2 x 3
#    ID_s Sim.Y1_m Sim.Y2_m     <--- Note the _m suffix
#   <int>    <dbl>    <dbl>
# 1     1    0.494  -0.0522
# 2     2   -0.104  -0.370

Upvotes: 2

dswdsyd

Reputation: 616

I understand the question to be, how do you get a column without referencing the column name, i.e. using the index instead.

Let me know if my understanding is incorrect.

If not, I believe the easiest way would be as per below.

> df1 <- data.frame(ID_s=c('a','b','c'),Val=c('a1','b1','c1'))
> df1
  ID_s Val
1    a  a1
2    b  b1
3    c  c1
> df1[,1]
[1] a b c
Levels: a b c

If you want to save that as a dataframe, can be extended as per below:

cc <- data.frame(ID_s=df1[,1])

Hope this helps!

Upvotes: 0

mnist

Reputation: 6956

First, you have to use seq_along() if you want to index you vector with s. Second, you are missing sym().

This should work:

simVec <- c('Sim.Y1.cor','Sim.Y3.cor')

for(s in seq_along(simVec)){

  simRef <- simVec[s]
  simID <- unlist(strsplit(simRef, split = '.cor',fixed = T))[1]   

  # this works  
  dat %>% dplyr::group_by(ID_s) %>%
    dplyr::summarise(meanMod = mean(Sim.Y1))

  # this doesn't work
  dat %>% dplyr::group_by(ID_s) %>%
    dplyr::summarise(meanMod = mean(!!sym(simID)))
}

edit: no Typo

Upvotes: 1

SYED SALMAN

Reputation: 31

Try this


library(dplyr)

dat %>% group_by(ID) %>% 
  summarise(mean_y1 =mean(Sim.Y1), 
            mean_y2 =mean(Sim.Y2), 
            mean_y3 =mean(Sim.Y3), 
            mean_obsY = mean(obsY))

Upvotes: 0

Refer a column by variable name

Answers (4)

Try this

Related Questions