python_enthusiast
python_enthusiast

Reputation: 946

How to filter data.table conditionally in R

I am doing operations in two columns of a data.table in R. I have first filtered the datatable by DATE, by1min and by EX, but then I realized that I actually don't want the data within the 1 minute bin. I want it for everything before that bin.

What I had before was:

bid_ask <- new_file[,list(btail=tail(BID,n=1),atail=tail(ASK,n=1)), by=c("DATE","by1min","EX")]

And now I wanted to do something like:

bid_ask <- new_file[,list(btail=tail(BID,n=1),atail=tail(ASK,n=1)), by=c("DATE",TIME_M[TIME_M<by1min],"EX")]

But this does not seem to be the structure for data.table.

Does anyone know how to filter conditionally within "by"?

My data looks like this:

         DATE   TIME_M EX SYM_SUFFIX   BID   ASK by1min
1: 2016-03-01 14400.02  P         NA 60.00 95.00  14400
2: 2016-03-01 24889.07  T         NA 60.01 65.00  24840
3: 2016-03-01 24889.25  T         NA 60.01 64.99  24840
4: 2016-03-01 25085.24  T         NA 60.01 64.99  25080
5: 2016-03-01 25085.36  T         NA 62.50 64.99  25080
6: 2016-03-01 25103.37  T         NA 62.51 64.99  25080

Upvotes: 0

Views: 274

Answers (1)

chinsoon12
chinsoon12

Reputation: 25225

The syntax is something like this:

bid_ask <- new_file[, list(btail=tail(BID, n=1),atail=tail(ASK, n=1)), by=.(DATE, TIME_M < by1min, EX)]
bid_ask 
#         DATE TIME_M EX btail atail
#1: 2016-03-01  FALSE  P 60.00 95.00
#2: 2016-03-01  FALSE  T 62.51 64.99
#3: 2016-03-01   TRUE  T 62.51 64.99
#4: 2016-03-01   TRUE  P 60.00 95.00

from ?data.table documentation:

by accepts:

  • a list() of expressions of column names: e.g., DT[, .(sa=sum(a)), by=.(x=x>0, y)]

sample data:

dat <- fread("DATE   TIME_M EX SYM_SUFFIX   BID   ASK by1min
2016-03-01 14400.02  P         NA 60.00 95.00  14400
2016-03-01 24889.07  T         NA 60.01 65.00  24840
2016-03-01 24889.25  T         NA 60.01 64.99  24840
2016-03-01 25085.24  T         NA 60.01 64.99  25080
2016-03-01 25085.36  T         NA 62.50 64.99  25080
2016-03-01 25103.37  T         NA 62.51 64.99  25080
2016-03-01 25000  T         NA 62.51 64.99  25080
2016-03-01 14399  P         NA 60.00 95.00  14400")

Upvotes: 1

Related Questions