apax
apax

Reputation: 160

Extraction operator `$`() returns zero-length vectors within function

I am encountering an issue when I use the extraction operator `$() inside of a function. The problem does not exist if I follow the same logic outside of the loop, so I assume there might be a scoping issue that I'm unaware of.

The general setup:

## Make some fake data for your reproducible needs.
set.seed(2345)

my_df <- data.frame(cat_1 = sample(c("a", "b"), 100, replace = TRUE),
                    cat_2 = sample(c("c", "d"), 100, replace = TRUE),
                    continuous  = rnorm(100),
                    stringsAsFactors = FALSE)
head(my_df)

This process I am trying to dynamically reproduce:

index <- which(`$`(my_df, "cat_1") == "a")

my_df$continuous[index]

But once I program this logic into a function, it fails:

## Function should take a string for the following:
##  cat_var - string with the categorical variable name as it appears in df
##  level - a level of cat_var appearing in df
##  df - data frame to operate on.  Function assumes it has a column 
##    "continuous".
extract_sample <- function(cat_var, level, df = my_df) {

  index <- which(`$`(df, cat_var) == level)

  df$continuous[index]

}

## Does not work.
extract_sample(cat_var = "cat_1", level = "a")

This is returning numeric(0). Any thoughts on what I'm missing? Alternative approaches are welcome as well.

Upvotes: 3

Views: 73

Answers (3)

Aaron - mostly inactive
Aaron - mostly inactive

Reputation: 37794

The problem is that the $ is non-standard, in the sense that when you don't quote the parameter input, it still tries to parse it and use what you typed, even if that was meant to refer to another variable.

Or more simply, as @42 put it in the first comment in the linked question:

The "$" function does not evaluate its arguments, whereas "[[" does`.

Here's a much simpler data set as an example.

my_df <- data.frame(a=c(1,2))
v <- "a"

Compare the usual usage; the first two give the same result, if you don't quote it, it parses it. So the third one (now) clearly doesn't work properly.

my_df$"a"
## [1] 1 2

my_df$a
## [1] 1 2

my_df$v
## NULL

That's exactly what's happening to you:

`$`(my_df, "a")
## [1] 1 2

`$`(my_df, v)
## NULL

Instead we need to evaluate v before sending to $ by using do.call.

do.call(`$`, list(my_df, v))
## [1] 1 2

Or, more appropriately, use the [[ version which does evaluate the parameters first.

`[[`(my_df, v)
## [1] 1 2

Upvotes: 3

Mark
Mark

Reputation: 4537

The problem isn't the function, it's the way $ handles the input.

cat_var = "cat_1"
length(`$`(my_df,"cat_1"))
#> [1] 100
length(`$`(my_df,cat_var))
#> [1] 0 

You can instead use [[ to achieve your desired outcome.

cat_var = "cat_1"
length(`[[`(my_df,"cat_1"))
#> [1] 100
length(`[[`(my_df,cat_var))
#> [1] 100

UPDATE

It's been noted that using [[ this way is ugly. And it is. It's useful when you want to write something like lapply(stuff,'[[',1)

Here, you should probably be writing it as my_df[[cat_var]].

Also, this question/answer goes into a little more detail about why $ doesn't work the way you want it to.

Upvotes: 4

rg255
rg255

Reputation: 4169

Problem lies in the way you are indexing to the column. This works just making a slight tweak to yours:

extract_sample <- function(cat_var, level, df = my_df) {
  index <- df[, cat_var] == level
  df$continuous[index]
}

Using it dynamically:

> extract_sample(cat_var = "cat_2", level = "d")
 [1] -0.42769207 -0.75650031  0.64077840 -1.02986889  1.34800344  0.70258431  1.25193247
 [8] -0.62892048  0.48822673  0.10432070  1.11986063 -0.88222370  0.39158408  1.39553002
[15] -0.51464283 -1.05265106  0.58391650  0.10555913  0.16277385 -0.55387829 -1.07822831
[22] -1.23894422 -2.32291394  0.11118881  0.34410388  0.07097271  1.00036812 -2.01981056
[29]  0.63417799 -0.53008375  1.16633422 -0.57130500  0.61614135  1.06768285  0.74182293
[36]  0.56538633  0.16784205 -0.14757303 -0.70928924 -1.91557732  0.61471302 -2.80741967
[43]  0.40552376 -1.88020372 -0.38821089 -0.42043745  1.87370600 -0.46198139  0.10788358
[50] -1.83945868 -0.11052531 -0.38743950  0.68110902 -1.48026285

Upvotes: 1

Related Questions