Reputation: 389
I have data table which looks like:
require(data.table)
df <- data.table(Day = seq(as.Date('2014-01-01'), as.Date('2014-12-31'), by = 'days'), Number = 1:365)
I want to subset my data table such that it returns just values of the first 110 rows which are higher than 10. When I use
df2 <- subset(df[1:110,], df$Number[1:110] > 10)
everything works well. However, if I subset using
df2 <- subset(df[1:110,], df[1:110,2] > 10)
R returns the following error:
Error in `[.data.table`(x, r, vars, with = FALSE) :
i is invalid type (matrix). Perhaps in future a 2 column matrix could return a list of elements of DT (in the spirit of A[B] in FAQ 2.14). Please report to data.table issue tracker if you'd like this, or add your comments to FR #657.
Should the way of subsetting not be the same? The problem is that I want to use this subset in an apply command and therefore, the names of the data table change. Hence, I cannot use the column name with the $-operator to refer to the second column and want use the index number but it does not work. I could rename the data table columns or read out the names of the column and use the $-operator but my apply function runs over lots of entries and I want to minimize the workload of the apply function. So how do I make the subsetting with the index number work and why do I get the mentioned error in the first place? I would like to understand what my mistake is. Thanks!
Upvotes: 0
Views: 333
Reputation: 388982
First let's understand why it doesn't work in your case. When you are doing
df[1:110,2] > 10
# Number
# [1,] FALSE
# [2,] FALSE
# [3,] FALSE
# [4,] FALSE
# [5,] FALSE
# [6,] FALSE
# [7,] FALSE
#....
it returns a 1 column matrix which is used for subsetting.
class(df[1:110,2] > 10)
#[1] "matrix"
which works fine on dataframe
df1 <- data.frame(df)
subset(df1[1:110,], df1[1:110,2] > 10)
# Day Number
#11 2014-01-11 11
#12 2014-01-12 12
#13 2014-01-13 13
#14 2014-01-14 14
#15 2014-01-15 15
#....
but not on data.table
. Unfortunately subsetting doesn't work that way in data.table
. You could convert it into a vector instead of matrix and then use it for subsetting
subset(df[1:110,], df[1:110][[2]] > 10)
# Day Number
# 1: 2014-01-11 11
# 2: 2014-01-12 12
# 3: 2014-01-13 13
# 4: 2014-01-14 14
# 5: 2014-01-15 15
#...
The difference would be more clear when you see the results of
df[matrix(TRUE), ]
vs
df1[matrix(TRUE), ]
PS - in the first case doing
subset(df[1:110,], Number > 10)
would also have worked.
Upvotes: 4