Reputation: 1016
Not sure if my title is good. Lets say I save such data.table:
set.seed(1234);
Type <- c("o", "o", "o", "o", "o", "o", "o", "s", "s", "s", "s", "s");
Contract.Month <- c("F", "G", "F", "G", "F", "G", "H", "F", "G", "F", "G", "F");
Date <- c(as.Date("2015-10-21"), as.Date("2015-10-21"), as.Date("2015-10-22"), as.Date("2015-10-22"), as.Date("2015-10-23"), as.Date("2015-10-23"), as.Date("2015-10-23"), as.Date("2015-10-21"), as.Date("2015-10-21"), as.Date("2015-10-22"), as.Date("2015-10-22"), as.Date("2015-10-23"));
Price <- rnorm(12);
Volume <- c(11, 10, 0, 12, 0, 0, 12, 1, 1, 1, 1, 1);
DT = data.table(Date, Price, Type, Contract.Month, Volume);
Which results to such table:
Date Price Type Contract.Month Volume
1: 2015-10-21 -1.2070657 o F 11
2: 2015-10-21 0.2774292 o G 10
3: 2015-10-22 1.0844412 o F 0
4: 2015-10-22 -2.3456977 o G 12
5: 2015-10-23 0.4291247 o F 0
6: 2015-10-23 0.5060559 o G 0
7: 2015-10-23 -0.5747400 o H 12
8: 2015-10-21 -0.5466319 s F 1
9: 2015-10-21 -0.5644520 s G 1
10: 2015-10-22 -0.8900378 s F 1
11: 2015-10-22 -0.4771927 s G 1
12: 2015-10-23 -0.9983864 s F 1
Now what I am trying to achieve is to get all rows with unique days with Type "o" and convert rows with Type "s" to columns. Result would be something like that:
Date Price Contract.Month Volume S1.Price S2.Price
1: 2015-10-21 -0.7485253 F 11 -0.5466319 -0.5644520
2: 2015-10-22 0.3015246 G 12 -0.4771927
3: 2015-10-23 0.4846456 H 12
To explain a bit more, what I have in real scenario is Futures prices and their spreads prices loaded in one big data.table. "o" is outright and "s" is spread. So I want to create continuous contract from outright rows and then add all spreads from that month to that day as columns. So in my example I used volume to create continuous contract (if volume is 0 then do not use that contract for that day, but I would also like to simply use earliest contract and use it to expiration as another approach). In my example first day has both spreads - because outright is F and spreads are F and G. Second day has only G spread because outright is already G. And third day do not have spread information because outright contract month is latter than spread on that day.
I'm trying to do it for a couple of days already but unsuccessfully. I am novice with R and data table, but basic principles seems clear to me, I just don't know how to implement this precise transformation as it doesn't seem very simple. Any help would be greatly appreciated!
Edit Added set.seed
Edit2 If someone finds this question i did something like this (sorry didn't check if it works on sample data but you can get an idea):
DTs <- DT %>% filter(Type=="s")
DTo <- DT %>% filter(Type=="o")
DT1 <- DT %>% filter(Type=="o") %>% group_by(Date) %>% filter(Date == Date & Contract.Month == min(Contract.Month) & Volume != 0)
out = inner_join(DT1, DTs, by = c("Date" = "Date")) %>% filter(Contract.Month.x <= Contract.Month.y)
out = out[, s1_seq:=paste0('s1_', seq(.N)), by=Date]
dcast(out, ... ~ s1_seq, value.var = c("Contract.Month.y", "Price.y"))
Upvotes: 0
Views: 114
Reputation: 2570
I tried around a bit and this is whatI got to: ( I used set.seed(1234), you need library(dplyr) to execute the code )
DT1 <- DT %>% filter(Type=="o") %>% group_by(Date) %>% filter(Volume==max(Volume))
First part is easy, DT1 contains unique Days from group "o" filtered by maximum volume:
Date Price Type Contract.Month Volume
1 2015-10-21 -1.207066 o F 11
2 2015-10-22 -2.345698 o G 12
3 2015-10-23 -0.574740 o H 12
Second part is a bit tricky and the code I used has a bad runtime, but this is the solution i came up with:
DTs <- DT %>% filter(Type=="s")
DTo <- DT %>% filter(Type=="o")
out <- data.frame(matrix(NA,ncol=length(unique(DT$Contract.Month)),nrow=length(unique(DTo$Date))),"Date"=unique(DTo$Date))
names(out) <- c(unique(DT$Contract.Month),"Date")
Create dummy subsets and a new dataframe with all possible contract months.
for(i in 1:nrow(out)){
for(j in 1:length(unique(DT$Contract.Month))){
if((nrow(DTo[which(DTo$Contract.Month==names(out)[j]&DTo$Date==out[i,"Date"])])!=0) &
(nrow(DTs[which(DTs$Contract.Month==names(out)[j]&DTs$Date==out[i,"Date"])])!=0)){
out[i,j] <- ifelse(DTo[which(DTo$Contract.Month==names(out)[j]&DTo$Date==out[i,"Date"])] %>% select(Volume) == 0,NA,DTs[which(DTs$Contract.Month==names(out)[j]&DTs$Date==out[i,"Date"])] %>% select(Price))
}
}
}
Now every possible contract month is compared in both dummy datasets by day, if the volume from type o != 0 the price from type s is saved in the specific contract month.
cbind(DT1,out[,1:3])
Date Price Type Contract.Month Volume F G H
1: 2015-10-21 -1.207066 o F 11 -0.5466319 -0.5644520 NA
2: 2015-10-22 -2.345698 o G 12 NA -0.4771927 NA
3: 2015-10-23 -0.574740 o H 12 NA NA NA
Bind this data.frame and the result looks like what you are looking for from what I've got from your question.
Hope this was helpful to you.
Upvotes: 1