pjoternivotich
pjoternivotich

Reputation: 23

Filtering data, comma vs not comma

I have the following code

#abnormal return 
exp.ret <- lm((RET-rf)~mkt.rf+smb+hml, data=tesla[tesla$period=="estimation.period",])
tesla$abn.ret <- (tesla$RET-tesla$rf)-predict(exp.ret,tesla)

#CAR during event window
CAR <- sum(tesla$abn.ret[tesla$period=="event.period",])

First section runs fine, but second gets this error:

"Error in tesla$abn.ret[tesla$period == "event.period", ] : incorrect number of dimensions

I know that the solution is to remove the last comma:

  #CAR during event window
    CAR <- sum(tesla$abn.ret[tesla$period=="event.period"])

Just wondering what is the right pedagogical way of understanding it, why do I need a comma in the end in some cases, but some not, when I'm filtering for only parts of the data frame.

Upvotes: 1

Views: 309

Answers (3)

rodolfoksveiga
rodolfoksveiga

Reputation: 1261

$ sign, [[]] and [] have different meanings.

In short:

  • $ sign and [[]] subsets one column of a dataframe or one item of a list.
    • The output of a subsetted dataframe will be a vector, while the output of a subsetted list will be a variable the same class as the original item, which can be a dataframe, another list, etc...
    • It's important to note that $ doesn't accept a column index (only a column name) and that you cannot insert two column names/index after $ or inside [[]].
  • [] slices a dataframe or a list sorting out one or more elements.
    • the class of the output variable will be the same as the original variable.
    • if you slice a dataframe using [], the output will be a dataframe, the same applies for lists, etc...

In your specific case, you used $ sign to subset your variable. Then, you tried to slice this output from the subset action using [ , ], but it turned out that the output is a vector, and a vector has always only one dimension and an error was fired. You should slice your vector using [] (the output will be a vector) or [[]] (the output will be a vector with length = 1).

Possible ways to subset tesla as you wish:

tesla$abn.ret[tesla$period == "event.period"]
tesla[["abn.ret"]][tesla$period == "event.period"]
tesla[tesla$period == "event.period", "abn.ret"]

You would achieve the same result using tesla[["period"]] instead of tesla$period.

For some extra details/examples, refer to An introduction to R, published by CRAN.

I hope it helped you somehow..!

Upvotes: 1

Heikki
Heikki

Reputation: 2254

If you look at the documentation with command ?'[', you find that the default behaviour of syntax x[i] is to drop one dimension away.

If you want to disable the dropping of the dimension, you have explicitly to write x[i,drop=False].

Upvotes: 0

Se&#241;or O
Se&#241;or O

Reputation: 17432

tesla$abn.ret is one-dimensional. Each comma separates a dimension, so yours implies 2 dimensions.

Alternatively you could run

tesla[tesla$period=="event.period", "abn.ret"]

And get the same results, since tesla is 2-d.

Upvotes: 0

Related Questions