R data.table subsetting within a group and splitting a data table into two

Question

I have the following data.table.

ts,id
1,a
2,a
3,a
4,a
5,a
6,a
7,a
1,b
2,b
3,b
4,b

I want to subset this data.table into two. The criteria is to have approximately the first half for each group (in this case column "id") in one data table and the remaining in another data.table. So the expected result are two data.tables as follows

ts,id
1,a
2,a
3,a
4,a
1,b
2,b

and

 ts,id
 5,a
 6,a
 7,a
 3,b
 4,b

I tried the following,

z1 = x[,.SD[.I < .N/2,],by=dev]
z1

and got just the following

id ts
a  1
a  2
a  3

Somehow, .I within the .SD isn't working the way I think it should. Any help appreciated. Thanks in advance.

mnel · Accepted Answer

.I gives the row locations with respect to the whole data.table. Thus it can't be used like that within .SD.

Something like

DT[, subset := seq_len(.N) > .N/2,by='id']

subset1 <- DT[(subset)][,subset:=NULL]
subset2 <- DT[!(subset)][,subset:=NULL]

subset1
#    ts id
# 1:  4  a
# 2:  5  a
# 3:  6  a
# 4:  7  a
# 5:  3  b
# 6:  4  b
subset2
#   ts id
# 1:  1  a
# 2:  2  a
# 3:  3  a
# 4:  1  b
# 5:  2  b

Should work

For more than 2 groups, you could use cut to create a factor with the appropriate number of levels

Something like

 DT[, subset := cut(seq_len(.N), 3, labels= FALSE),by='id']
 # you could copy to the global environment a subset for each, but this 
 # will not be memory efficient!
 list2env(setattr(split(DT, DT[['subset']]),'names', paste0('s',1:3)), .GlobalEnv)

R data.table subsetting within a group and splitting a data table into two

Answers (2)

Related Questions