titeuf
titeuf

Reputation: 163

Why is print() going to change the output of my function?

I am working on a function that tries to give me the top answers of a column. In the example below there is just a part of my whole function. My final goal is to run the function over a loop. I have detected something weird: why is print(df_col_indicator) gonna change the result when I define "df_col_indicator" externally and not within my function? With print(df_col_indicator) my function is actually exactly doing what I want..

library(dplyr)
library(tidyverse)

remove(list = ls())


dataframe_test <- data.frame(
  county_name = c("a", "b","c", "d","e", "f", "g", "h"),
  column_test1 = c(100,100,100,100,100,100,50,50),
  column_test2 = c(40,90,50,40,40,100,13,14),
  column_test3 = c(100,90,50,40,30,40,100,50),
  month = c("2020-09-01", "2020-09-01" ,"2020-09-01" ,"2020-09-01" ,"2020-09-01" ,"2020-09-01" ,"2020-08-01","2020-08-01"))


choose_top_5 <- function(df, df_col_indicator, df_col_month, char_month, numb_top, df_col_county) {
  
  ### this here changes output of my function
  #print(df_col_indicator) # changes output of my function depending on included or excluded
  
  ### enquo / ensym / deparse
  df_col_indicator_ensym <- ensym(df_col_indicator)
  
  df_col_month_ensym <- ensym(df_col_month)
  
  
  ### filter month and top 5 observations
  df_top <- df %>%
    filter(!!df_col_month_ensym == char_month) %>%
    slice_max(!!df_col_indicator_ensym, n = numb_top) %>%
    select(!!df_col_county, !!df_col_month_ensym, !!df_col_indicator_ensym)
  
  
  
  return(df_top)
  
  
}




### define "df_col_indicator" within the function
a = choose_top_5(df = dataframe_test, df_col_indicator = "column_test3",
                 df_col_month = "month", char_month = "2020-09-01", numb_top = 5,
                 df_col_county = "county_name")

a


### define "df_col_indicator" externally
external = "column_test3"

b = choose_top_5(df = dataframe_test, df_col_indicator = external,
                 df_col_month = "month", char_month = "2020-09-01", numb_top = 5,
                 df_col_county = "county_name")
b



### goal is to run function over loop
external <- c("column_test1","column_test2","column_test3")

my_list <- list()

for (i in external) {
  
  my_list[[i]] <- choose_top_5(df = dataframe_test, df_col_indicator = i,
                               df_col_month = "month", char_month = "2020-09-01", numb_top = 5,
                               df_col_county = "county_name")
}

my_list



Upvotes: 1

Views: 172

Answers (2)

ekoam
ekoam

Reputation: 8844

You also have to change ensym to as.symbol.

Consider a function like this

f <- function(x) ensym(x)
myvar <- "some string"

You will find that

> f("some string")
`some string`

> f(myvar)
myvar

This is because ensym only searches for the thing one step ahead. It attempts to convert whatever thing found into a symbol and just returns that (note that if what found is neither a string nor variable, then you will get an error). As such, in your first example, ensym returns column_test3; in your second one, it returns external.

As far as I can tell, what you want to do is getting the value that df_col_indicator represents and then converting that value into a symbol. This means you have to first evaluate df_col_indicator and then convert. as.symbol does what you need.

g <- function(x) as.symbol(x)
myvar <- "some string"

Some tests

> g("some string")
`some string`

> g(myvar)
`some string`

Upvotes: 2

Allan Cameron
Allan Cameron

Reputation: 174506

Your example is quite lengthy. Let's boil it down to a minimal reproducible example with two very similar functions. These both take a single argument and simply print the passed variable to the console, and return the result of calling ensym on the same variable.

The only difference between the two is the order in which the calls to print and ensym are made.

library(rlang)

test_ensym1 <- function(x)
{
  result <- ensym(x)
  print(x)
  return(result)
}

test_ensym2 <- function(x)
{
  print(x)
  result <- ensym(x)
  return(result)
}

Now we might expect these two functions to do exactly the same thing, and indeed when we pass a string directly to them, they both give the same result:

test_ensym1("hello")
#> [1] "hello"
#> hello

test_ensym2("hello")
#> [1] "hello"
#> hello

But look what happens when we use an external variable to pass in our string:

y <- "hello"

test_ensym1(y)
#> [1] "hello"
#> y

test_ensym2(y)
#> [1] "hello"
#> hello

The functions both still print "hello", as expected, but they return a different result. When we called ensym first, the function returned the symbol y, and when we called print first it returned the symbol hello.

The reason for this is that when you call a function in R, the symbols you pass as parameters are not evaluated immediately. Instead, they are interpreted as promise objects and evaluated as required in the body of the function. It is this lazy evalutation that allows for some of the tidyverse trickery.

The difference between the two functions above is that calling print(x) forces the evaluation of x. Before that point, x is an unevaluated symbol. Afterwards, it behaves just like any other variable you would use interactively in the console, so when you call ensym, you are calling it on this evaluated variable, not as an unevaluated promise.

ensym, on the other hand, does not evaluate x, so if ensym is called first, it will return the unevaluated symbol that was passed to the function.

So actually, the easiest way to fix your problem is to move print to after the ensym call.

Upvotes: 4

Related Questions