Jane Sully
Jane Sully

Reputation: 3337

How to index dataframe column inside a function in R

I have a function that takes in a dataframe, a percentile threshold, and the name of a given column, and computes all values that are above this threshold in the given column as a new column (0 for <, and 1 for >=). However, it won't allow me to do the df$column_name inside the quantile function because column_name is not actually a column name, but a variable storing the actual column name. Therefore df$column_name will return NULL. Is there any way to work around this and keep the code forma somewhat similar to what it is currently? Or do I have to specify the actual numerical column value instead of the name? While I can do this, it is definitely not as convenient/comprehensible as just passing in the column name.

func1 <- function(df, threshold, column_name) {
  threshold_value <- quantile(df$column_name, c(threshold)) 
  new_df <- df %>%
    mutate(ifelse(column_name > threshold_value, 1, 0)) 
  return(new_df)
}

Thank you so much for your help!

Upvotes: 3

Views: 1697

Answers (1)

www
www

Reputation: 39154

I modified your function as follows. Now the function can take a data frame, a threshold, and a column name. This function only needs the base R.

# Modified function
func1 <- function(df, threshold, column_name) {
  threshold_value <- quantile(df[[column_name]], threshold) 
  new_df <- df
  new_df[["new_col"]] <- ifelse(df[[column_name]] > threshold_value, 1, 0) 
  return(new_df)
}

# Take the trees data frame as an example
head(trees)
#   Girth Height Volume
# 1   8.3     70   10.3
# 2   8.6     65   10.3
# 3   8.8     63   10.2
# 4  10.5     72   16.4
# 5  10.7     81   18.8
# 6  10.8     83   19.7

# Apply the function
func1(trees, 0.5, "Volume")
#    Girth Height Volume new_col
# 1    8.3     70   10.3       0
# 2    8.6     65   10.3       0
# 3    8.8     63   10.2       0
# 4   10.5     72   16.4       0
# 5   10.7     81   18.8       0
# 6   10.8     83   19.7       0
# 7   11.0     66   15.6       0
# 8   11.0     75   18.2       0
# 9   11.1     80   22.6       0
# 10  11.2     75   19.9       0
# 11  11.3     79   24.2       0
# 12  11.4     76   21.0       0
# 13  11.4     76   21.4       0
# 14  11.7     69   21.3       0
# 15  12.0     75   19.1       0
# 16  12.9     74   22.2       0
# 17  12.9     85   33.8       1
# 18  13.3     86   27.4       1
# 19  13.7     71   25.7       1
# 20  13.8     64   24.9       1
# 21  14.0     78   34.5       1
# 22  14.2     80   31.7       1
# 23  14.5     74   36.3       1
# 24  16.0     72   38.3       1
# 25  16.3     77   42.6       1
# 26  17.3     81   55.4       1
# 27  17.5     82   55.7       1
# 28  17.9     80   58.3       1
# 29  18.0     80   51.5       1
# 30  18.0     80   51.0       1
# 31  20.6     87   77.0       1

If you still want to use , it is essential to learn how to deal with non-standard evaluation. Please see this to learn more (https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html). The following code will also works.

library(dplyr)

func2 <- function(df, threshold, column_name) {
  col_en <- enquo(column_name)
  threshold_value <- quantile(df %>% pull(!!col_en), threshold)
  new_df <- df %>%
    mutate(new_col := ifelse(!!col_en >= threshold_value, 1, 0))
  return(new_df)
}

func2(trees, 0.5, Volume)
#    Girth Height Volume new_col
# 1    8.3     70   10.3       0
# 2    8.6     65   10.3       0
# 3    8.8     63   10.2       0
# 4   10.5     72   16.4       0
# 5   10.7     81   18.8       0
# 6   10.8     83   19.7       0
# 7   11.0     66   15.6       0
# 8   11.0     75   18.2       0
# 9   11.1     80   22.6       0
# 10  11.2     75   19.9       0
# 11  11.3     79   24.2       1
# 12  11.4     76   21.0       0
# 13  11.4     76   21.4       0
# 14  11.7     69   21.3       0
# 15  12.0     75   19.1       0
# 16  12.9     74   22.2       0
# 17  12.9     85   33.8       1
# 18  13.3     86   27.4       1
# 19  13.7     71   25.7       1
# 20  13.8     64   24.9       1
# 21  14.0     78   34.5       1
# 22  14.2     80   31.7       1
# 23  14.5     74   36.3       1
# 24  16.0     72   38.3       1
# 25  16.3     77   42.6       1
# 26  17.3     81   55.4       1
# 27  17.5     82   55.7       1
# 28  17.9     80   58.3       1
# 29  18.0     80   51.5       1
# 30  18.0     80   51.0       1
# 31  20.6     87   77.0       1

Upvotes: 6

Related Questions