writer_typer
writer_typer

Reputation: 798

How to create a function by passing a column name as an argument without specifying data each time?

I have two issues:

  1. I want to create a function by passing a column name (I'm getting this error: Error in `$.default`(dat, "var") : $ operator is invalid for atomic vectors)
  2. Then, I would like to create a function without having to specify the data each time (I tried using attach(dat) but it didn't work)

I'm trying to create a function that lets an user input a dataset and a column name.

Here's the function I'm trying to create'

fre <- function(dat, var) {
      
            abc <- questionr::na.rm(dat$var)
            abc <- questionr::freq(abc)
            abc <- cbind(Label = rownames(abc), abc)
            abc <- questionr::rename.variable(abc, "n", "Frequency")
            abc <- questionr::rename.variable(abc, "%", "Percent")
            abc <- tidyr::separate(abc, Label, into = c("Value", "Label"), sep = "] ")
            row.names(abc) <- NULL
            abc <- abc %>% dplyr::mutate(Value = gsub("[[:punct:]]", '', Value)) %>% dplyr::select(Label, Value, Frequency, Percent)
            abc
}

Reproducible example

library(haven)
#install.packages("questionr")
library(questionr)
library(dplyr)
library(tidyr)

# Load data
dat <- read_sav(url("http://staff.bath.ac.uk/pssiw/stats2/SAQ.sav"))


abc <- questionr::na.rm(dat$Q01)
abc <- questionr::freq(abc)
abc <- cbind(Label = rownames(abc), abc)
abc <- questionr::rename.variable(abc, "n", "Frequency")
abc <- questionr::rename.variable(abc, "%", "Percent")
abc <- tidyr::separate(abc, Label, into = c("Value", "Label"), sep = "] ")
row.names(abc) <- NULL
abc <- abc %>% dplyr::mutate(Value = gsub("[[:punct:]]", '', Value)) %>% dplyr::select(Label, Value, Frequency, Percent)
abc

In the end, my output from the above code looks like this: enter image description here

I'm trying to get this by using this function:

fre(dat, Q01)

but I'm getting this error:

Error in `$.default`(dat, "var") : $ operator is invalid for atomic vectors

How should I pass the column name in this function for it to work? I tried var <- enquo(var) but it didn't work.

For the second issue, I've tried using attach(dat) before calling a function, but it didn't work. Ideally, I would like to make the fre function work and then eventually use it without passing the data argument.

Upvotes: 0

Views: 66

Answers (2)

starja
starja

Reputation: 10365

You actually don't need too much black magic here. I've made 2 versions of the function.

  • fre_pipe needs the data as an input argument, but it can be used with the pipe
  • fre_free relies on an object called global_datthat has to be defined in the calling environment

You don't need enquo here, because you don't need to capture the environment of your variable. ensym is enough (it ensures that your var is treated as a symbol and is not executed). In the second step, you can use as_string to convert it to a string. For further reading see the metaprogramming chapter in advanced R.

library(haven)
library(questionr)
library(dplyr)
library(tidyr)

# Load data
dat <- read_sav(url("http://staff.bath.ac.uk/pssiw/stats2/SAQ.sav"))

fre_pipe <- function(.data, var) {
  var <- rlang::ensym(var)
  
  abc <- questionr::na.rm(.data[, rlang::as_string(var)])
  abc <- questionr::freq(abc)
  abc <- cbind(Label = rownames(abc), abc)
  abc <- questionr::rename.variable(abc, "n", "Frequency")
  abc <- questionr::rename.variable(abc, "%", "Percent")
  abc <- tidyr::separate(abc, Label, into = c("Value", "Label"), sep = "] ")
  row.names(abc) <- NULL
  abc <- abc %>% dplyr::mutate(Value = gsub("[[:punct:]]", '', Value)) %>% dplyr::select(Label, Value, Frequency, Percent)
  abc
}

dat %>% fre_pipe(Q01)
#>               Label Value Frequency Percent
#> 1    Strongly agree     1       270    10.5
#> 2             Agree     2      1338    52.0
#> 3           Neither     3       735    28.6
#> 4          Disagree     4       187     7.3
#> 5 Strongly disagree     5        41     1.6
#> 6      Not answered     9         0     0.0

fre_free <- function(var) {
  var <- rlang::ensym(var)
  
  abc <- questionr::na.rm(global_dat[, rlang::as_string(var)])
  abc <- questionr::freq(abc)
  abc <- cbind(Label = rownames(abc), abc)
  abc <- questionr::rename.variable(abc, "n", "Frequency")
  abc <- questionr::rename.variable(abc, "%", "Percent")
  abc <- tidyr::separate(abc, Label, into = c("Value", "Label"), sep = "] ")
  row.names(abc) <- NULL
  abc <- abc %>% dplyr::mutate(Value = gsub("[[:punct:]]", '', Value)) %>% dplyr::select(Label, Value, Frequency, Percent)
  abc
}

global_dat <- dat

fre_free(Q01)
#>               Label Value Frequency Percent
#> 1    Strongly agree     1       270    10.5
#> 2             Agree     2      1338    52.0
#> 3           Neither     3       735    28.6
#> 4          Disagree     4       187     7.3
#> 5 Strongly disagree     5        41     1.6
#> 6      Not answered     9         0     0.0

Created on 2020-09-05 by the reprex package (v0.3.0)

I don't think that fre_free without the data argument is good style. If you're tired of always repeating the argument, maybe you want to apply your function repeatedly with lapply or map? Something like:

vector_with_column_names %>% 
  purrr::walk(~print(fre(dat = dat, var = .x)))

(But here the normal c(Q01, Q02) would not work and you would either need to make a function to create vectors of symbols or use the column names.)

Upvotes: 1

writer_typer
writer_typer

Reputation: 798

Based on r2evans comments, this worked :

fre <- function(dat, var) {
            abc <- questionr::na.rm(dat[[var]])
            abc <- questionr::freq(abc)
            abc <- cbind(Label = rownames(abc), abc)
            abc <- questionr::rename.variable(abc, "n", "Frequency")
            abc <- questionr::rename.variable(abc, "%", "Percent")
            abc <- tidyr::separate(abc, Label, into = c("Value", "Label"), sep = "] ")
            row.names(abc) <- NULL
            abc <- abc %>% dplyr::mutate(Value = gsub("[[:punct:]]", '', Value)) %>% dplyr::select(Label, Value, Frequency, Percent)
            abc
}

fre(dat, "Q01")

But, I'm still looking for a way to not pass the data argument each time. And it would be a bonus to find a way to not use "".

Upvotes: 0

Related Questions