sh_student
sh_student

Reputation: 389

Error subsetting data table with "[]" but not with $-operator

I have data table which looks like:

require(data.table)
df <- data.table(Day = seq(as.Date('2014-01-01'), as.Date('2014-12-31'), by = 'days'), Number = 1:365)

I want to subset my data table such that it returns just values of the first 110 rows which are higher than 10. When I use

df2 <- subset(df[1:110,], df$Number[1:110] > 10)

everything works well. However, if I subset using

df2 <- subset(df[1:110,],  df[1:110,2] > 10)

R returns the following error:

Error in `[.data.table`(x, r, vars, with = FALSE) : 
  i is invalid type (matrix). Perhaps in future a 2 column matrix could return a list of elements of DT (in the spirit of A[B] in FAQ 2.14). Please report to data.table issue tracker if you'd like this, or add your comments to FR #657.

Should the way of subsetting not be the same? The problem is that I want to use this subset in an apply command and therefore, the names of the data table change. Hence, I cannot use the column name with the $-operator to refer to the second column and want use the index number but it does not work. I could rename the data table columns or read out the names of the column and use the $-operator but my apply function runs over lots of entries and I want to minimize the workload of the apply function. So how do I make the subsetting with the index number work and why do I get the mentioned error in the first place? I would like to understand what my mistake is. Thanks!

Upvotes: 0

Views: 333

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388982

First let's understand why it doesn't work in your case. When you are doing

df[1:110,2] > 10

#       Number
#  [1,]  FALSE
#  [2,]  FALSE
#  [3,]  FALSE
#  [4,]  FALSE
#  [5,]  FALSE
#  [6,]  FALSE
#  [7,]  FALSE
#....

it returns a 1 column matrix which is used for subsetting.

class(df[1:110,2] > 10)
#[1] "matrix"

which works fine on dataframe

df1 <- data.frame(df)
subset(df1[1:110,],  df1[1:110,2] > 10)

#           Day Number
#11  2014-01-11     11
#12  2014-01-12     12
#13  2014-01-13     13
#14  2014-01-14     14
#15  2014-01-15     15
#....

but not on data.table. Unfortunately subsetting doesn't work that way in data.table. You could convert it into a vector instead of matrix and then use it for subsetting

subset(df[1:110,],  df[1:110][[2]] > 10)

#            Day Number
#  1: 2014-01-11     11
#  2: 2014-01-12     12
#  3: 2014-01-13     13
#  4: 2014-01-14     14
#  5: 2014-01-15     15
#...

The difference would be more clear when you see the results of

df[matrix(TRUE), ]

vs

df1[matrix(TRUE), ]

PS - in the first case doing

subset(df[1:110,], Number > 10)

would also have worked.

Upvotes: 4

Related Questions