Reputation: 477
So I have a dataframe "fish8" and I tried writing a function that excludes all empty rows for three of the dataframe's columns (BIN, collectors, country). The thing is the code isn't being run inside the function, but it runs outside of it. I have many other similar functions in the script and they work, why isn't this one working?
#so it doesn't work when I run it like this
remove_empties=function(fish8){
fish8<<-fish8[!(fish8$BIN == "" | is.na(fish8$BIN)), ]
fish8<<-fish8[!(fish8$collectors == "" | is.na(fish8$collectors)), ]
fish8<<-fish8[!(fish8$country == "" | is.na(fish8$country)), ]
}
remove_empties(fish8)
#but it runs like this
fish8<-fish8[!(fish8$BIN == "" | is.na(fish8$BIN)), ]
fish8<-fish8[!(fish8$collectors == "" | is.na(fish8$collectors)), ]
fish8<-fish8[!(fish8$country == "" | is.na(fish8$country)), ]
Upvotes: 1
Views: 849
Reputation: 11255
The problem is related to the scope of variables. In this case, the function's variable fish8
is getting assigned within the function scope. The original fish8
doesn't get touched. See https://www.r-bloggers.com/dont-run-afoul-of-scoping-rules-in-r/ :
What happens with <<- is that it starts walking up the environment tree from child to parent until it either finds a match, or ends up in the global (top) environment. This is a way to initiate a tree-walk (like automatic searching) but with dire consequences because you are making an assignment outside of the current scope! Only the first match it finds will get changed, whether or not it is at the global environment.
Your options include:
remove_empties = function(fish8) {
fish8 <- fish8[!(fish8$x == '' | is.na(fish8$x)), ]
fish8 <- fish8[!(fish8$y == '' | is.na(fish8$y)), ]
}
fish8 <- remove_empties(fish8)
remove_empties2 = function(fish) {
fish <- fish[!(fish$x == '' | is.na(fish$x)), ]
fish <- fish[!(fish$y == '' | is.na(fish$y)), ]
}
fish8 <- remove_empties2(fish8)
remove_empties3 = function(fish) {
fish8 <<- fish[!(fish$x == '' | is.na(fish$x))
& !(fish$y == '' | is.na(fish$y)), ]
}
remove_empties3(fish8)
NA
and then using na.omit()
. I'd also forgo the function call - this at most one extra line than a function call and should only have to be done once as empty strings shouldn't be re-introduced:fish8[fish8==''] <- NA_character_
fish8 <- na.omit(fish8)
Data:
set.seed(1)
x <- sample(c('',NA_character_, letters[1:5]), 20, replace = T)
y <- sample(c('', NA_character_, letters[6:10]), 20, replace = T)
fish8 <- data.frame(x, y)
Upvotes: 2