tadeufontes
tadeufontes

Reputation: 477

Why isn't my code working inside a function I created, but works when I run it in the global environment?

So I have a dataframe "fish8" and I tried writing a function that excludes all empty rows for three of the dataframe's columns (BIN, collectors, country). The thing is the code isn't being run inside the function, but it runs outside of it. I have many other similar functions in the script and they work, why isn't this one working?

#so it doesn't work when I run it like this
remove_empties=function(fish8){
  fish8<<-fish8[!(fish8$BIN == "" | is.na(fish8$BIN)), ]
  fish8<<-fish8[!(fish8$collectors == "" | is.na(fish8$collectors)), ]
  fish8<<-fish8[!(fish8$country == "" | is.na(fish8$country)), ]
}
remove_empties(fish8)

#but it runs like this
fish8<-fish8[!(fish8$BIN == "" | is.na(fish8$BIN)), ]
fish8<-fish8[!(fish8$collectors == "" | is.na(fish8$collectors)), ]
fish8<-fish8[!(fish8$country == "" | is.na(fish8$country)), ]

Upvotes: 1

Views: 849

Answers (1)

Cole
Cole

Reputation: 11255

The problem is related to the scope of variables. In this case, the function's variable fish8 is getting assigned within the function scope. The original fish8 doesn't get touched. See https://www.r-bloggers.com/dont-run-afoul-of-scoping-rules-in-r/ :

What happens with <<- is that it starts walking up the environment tree from child to parent until it either finds a match, or ends up in the global (top) environment. This is a way to initiate a tree-walk (like automatic searching) but with dire consequences because you are making an assignment outside of the current scope! Only the first match it finds will get changed, whether or not it is at the global environment.

Your options include:

  1. Remove the double assignment and reassign the results of the function to the original dataframe
remove_empties = function(fish8) {
  fish8 <- fish8[!(fish8$x == '' | is.na(fish8$x)), ]
  fish8 <- fish8[!(fish8$y == '' | is.na(fish8$y)), ]
}

fish8 <- remove_empties(fish8)
  1. Using a different variable within the function which would be better practice than having the same variable name in two different environments.
remove_empties2 = function(fish) {
  fish <- fish[!(fish$x == '' | is.na(fish$x)), ]
  fish <- fish[!(fish$y == '' | is.na(fish$y)), ]
}

fish8 <- remove_empties2(fish8)
  1. Change the variable name in the function but globally assigning the original variable. I don't like this route:
remove_empties3 = function(fish) {
  fish8 <<- fish[!(fish$x == '' | is.na(fish$x))
                 & !(fish$y == '' | is.na(fish$y)), ]
}

remove_empties3(fish8)
  1. My favorite option: reassigning the empty strings as NA and then using na.omit(). I'd also forgo the function call - this at most one extra line than a function call and should only have to be done once as empty strings shouldn't be re-introduced:
fish8[fish8==''] <- NA_character_
fish8 <- na.omit(fish8)

Data:

set.seed(1)
x <- sample(c('',NA_character_, letters[1:5]), 20, replace = T)
y <- sample(c('', NA_character_, letters[6:10]), 20, replace = T)

fish8 <- data.frame(x, y)

Upvotes: 2

Related Questions