Reputation: 160
I am encountering an issue when I use the extraction operator `$() inside of a function. The problem does not exist if I follow the same logic outside of the loop, so I assume there might be a scoping issue that I'm unaware of.
The general setup:
## Make some fake data for your reproducible needs.
set.seed(2345)
my_df <- data.frame(cat_1 = sample(c("a", "b"), 100, replace = TRUE),
cat_2 = sample(c("c", "d"), 100, replace = TRUE),
continuous = rnorm(100),
stringsAsFactors = FALSE)
head(my_df)
This process I am trying to dynamically reproduce:
index <- which(`$`(my_df, "cat_1") == "a")
my_df$continuous[index]
But once I program this logic into a function, it fails:
## Function should take a string for the following:
## cat_var - string with the categorical variable name as it appears in df
## level - a level of cat_var appearing in df
## df - data frame to operate on. Function assumes it has a column
## "continuous".
extract_sample <- function(cat_var, level, df = my_df) {
index <- which(`$`(df, cat_var) == level)
df$continuous[index]
}
## Does not work.
extract_sample(cat_var = "cat_1", level = "a")
This is returning numeric(0)
. Any thoughts on what I'm missing? Alternative approaches are welcome as well.
Upvotes: 3
Views: 73
Reputation: 37794
The problem is that the $
is non-standard, in the sense that when you don't quote the parameter input, it still tries to parse it and use what you typed, even if that was meant to refer to another variable.
Or more simply, as @42 put it in the first comment in the linked question:
The "$" function does not evaluate its arguments, whereas "[[" does`.
Here's a much simpler data set as an example.
my_df <- data.frame(a=c(1,2))
v <- "a"
Compare the usual usage; the first two give the same result, if you don't quote it, it parses it. So the third one (now) clearly doesn't work properly.
my_df$"a"
## [1] 1 2
my_df$a
## [1] 1 2
my_df$v
## NULL
That's exactly what's happening to you:
`$`(my_df, "a")
## [1] 1 2
`$`(my_df, v)
## NULL
Instead we need to evaluate v
before sending to $
by using do.call
.
do.call(`$`, list(my_df, v))
## [1] 1 2
Or, more appropriately, use the [[
version which does evaluate the parameters first.
`[[`(my_df, v)
## [1] 1 2
Upvotes: 3
Reputation: 4537
The problem isn't the function, it's the way $
handles the input.
cat_var = "cat_1"
length(`$`(my_df,"cat_1"))
#> [1] 100
length(`$`(my_df,cat_var))
#> [1] 0
You can instead use [[
to achieve your desired outcome.
cat_var = "cat_1"
length(`[[`(my_df,"cat_1"))
#> [1] 100
length(`[[`(my_df,cat_var))
#> [1] 100
UPDATE
It's been noted that using [[
this way is ugly. And it is. It's useful when you want to write something like lapply(stuff,'[[',1)
Here, you should probably be writing it as my_df[[cat_var]]
.
Also, this question/answer goes into a little more detail about why $
doesn't work the way you want it to.
Upvotes: 4
Reputation: 4169
Problem lies in the way you are indexing to the column. This works just making a slight tweak to yours:
extract_sample <- function(cat_var, level, df = my_df) {
index <- df[, cat_var] == level
df$continuous[index]
}
Using it dynamically:
> extract_sample(cat_var = "cat_2", level = "d")
[1] -0.42769207 -0.75650031 0.64077840 -1.02986889 1.34800344 0.70258431 1.25193247
[8] -0.62892048 0.48822673 0.10432070 1.11986063 -0.88222370 0.39158408 1.39553002
[15] -0.51464283 -1.05265106 0.58391650 0.10555913 0.16277385 -0.55387829 -1.07822831
[22] -1.23894422 -2.32291394 0.11118881 0.34410388 0.07097271 1.00036812 -2.01981056
[29] 0.63417799 -0.53008375 1.16633422 -0.57130500 0.61614135 1.06768285 0.74182293
[36] 0.56538633 0.16784205 -0.14757303 -0.70928924 -1.91557732 0.61471302 -2.80741967
[43] 0.40552376 -1.88020372 -0.38821089 -0.42043745 1.87370600 -0.46198139 0.10788358
[50] -1.83945868 -0.11052531 -0.38743950 0.68110902 -1.48026285
Upvotes: 1