nesvarbu
nesvarbu

Reputation: 1016

How to transform data.table to rows with unique dates and convert other rows (with duplicate date) to columns

Not sure if my title is good. Lets say I save such data.table:

set.seed(1234);
Type <- c("o", "o", "o", "o", "o", "o", "o", "s", "s", "s", "s", "s");
Contract.Month <- c("F", "G", "F", "G", "F", "G", "H", "F", "G", "F", "G", "F");
Date <- c(as.Date("2015-10-21"), as.Date("2015-10-21"), as.Date("2015-10-22"), as.Date("2015-10-22"), as.Date("2015-10-23"), as.Date("2015-10-23"), as.Date("2015-10-23"),  as.Date("2015-10-21"),  as.Date("2015-10-21"),  as.Date("2015-10-22"),  as.Date("2015-10-22"),  as.Date("2015-10-23"));
Price <- rnorm(12);
Volume <- c(11, 10, 0, 12, 0, 0, 12, 1, 1, 1, 1, 1);
DT = data.table(Date, Price, Type, Contract.Month, Volume);

Which results to such table:

     Date      Price      Type     Contract.Month Volume
 1: 2015-10-21 -1.2070657    o              F     11
 2: 2015-10-21  0.2774292    o              G     10
 3: 2015-10-22  1.0844412    o              F      0
 4: 2015-10-22 -2.3456977    o              G     12
 5: 2015-10-23  0.4291247    o              F      0
 6: 2015-10-23  0.5060559    o              G      0
 7: 2015-10-23 -0.5747400    o              H     12
 8: 2015-10-21 -0.5466319    s              F      1
 9: 2015-10-21 -0.5644520    s              G      1
10: 2015-10-22 -0.8900378    s              F      1
11: 2015-10-22 -0.4771927    s              G      1
12: 2015-10-23 -0.9983864    s              F      1

Now what I am trying to achieve is to get all rows with unique days with Type "o" and convert rows with Type "s" to columns. Result would be something like that:

 Date      Price      Contract.Month Volume  S1.Price     S2.Price
 1: 2015-10-21 -0.7485253        F     11    -0.5466319 -0.5644520
 2: 2015-10-22  0.3015246        G     12    -0.4771927
 3: 2015-10-23  0.4846456        H     12

To explain a bit more, what I have in real scenario is Futures prices and their spreads prices loaded in one big data.table. "o" is outright and "s" is spread. So I want to create continuous contract from outright rows and then add all spreads from that month to that day as columns. So in my example I used volume to create continuous contract (if volume is 0 then do not use that contract for that day, but I would also like to simply use earliest contract and use it to expiration as another approach). In my example first day has both spreads - because outright is F and spreads are F and G. Second day has only G spread because outright is already G. And third day do not have spread information because outright contract month is latter than spread on that day.

I'm trying to do it for a couple of days already but unsuccessfully. I am novice with R and data table, but basic principles seems clear to me, I just don't know how to implement this precise transformation as it doesn't seem very simple. Any help would be greatly appreciated!

Edit Added set.seed

Edit2 If someone finds this question i did something like this (sorry didn't check if it works on sample data but you can get an idea):

DTs <- DT %>% filter(Type=="s")
DTo <- DT %>% filter(Type=="o")
DT1 <- DT %>% filter(Type=="o") %>% group_by(Date) %>% filter(Date == Date & Contract.Month == min(Contract.Month) & Volume != 0)
out = inner_join(DT1, DTs, by = c("Date" = "Date")) %>% filter(Contract.Month.x <= Contract.Month.y)
out = out[, s1_seq:=paste0('s1_', seq(.N)), by=Date]
dcast(out, ... ~ s1_seq, value.var = c("Contract.Month.y", "Price.y"))

Upvotes: 0

Views: 114

Answers (1)

Sebastian
Sebastian

Reputation: 2570

I tried around a bit and this is whatI got to: ( I used set.seed(1234), you need library(dplyr) to execute the code )

DT1 <- DT %>% filter(Type=="o") %>% group_by(Date) %>% filter(Volume==max(Volume))

First part is easy, DT1 contains unique Days from group "o" filtered by maximum volume:

        Date     Price Type Contract.Month Volume
1 2015-10-21 -1.207066    o              F     11
2 2015-10-22 -2.345698    o              G     12
3 2015-10-23 -0.574740    o              H     12

Second part is a bit tricky and the code I used has a bad runtime, but this is the solution i came up with:

DTs <- DT %>% filter(Type=="s")
DTo <- DT %>% filter(Type=="o")

out <- data.frame(matrix(NA,ncol=length(unique(DT$Contract.Month)),nrow=length(unique(DTo$Date))),"Date"=unique(DTo$Date))
names(out) <- c(unique(DT$Contract.Month),"Date")

Create dummy subsets and a new dataframe with all possible contract months.

for(i in 1:nrow(out)){
  for(j in 1:length(unique(DT$Contract.Month))){

    if((nrow(DTo[which(DTo$Contract.Month==names(out)[j]&DTo$Date==out[i,"Date"])])!=0) & 
       (nrow(DTs[which(DTs$Contract.Month==names(out)[j]&DTs$Date==out[i,"Date"])])!=0)){
         out[i,j] <- ifelse(DTo[which(DTo$Contract.Month==names(out)[j]&DTo$Date==out[i,"Date"])] %>% select(Volume) == 0,NA,DTs[which(DTs$Contract.Month==names(out)[j]&DTs$Date==out[i,"Date"])] %>% select(Price))
     }
  }
}

Now every possible contract month is compared in both dummy datasets by day, if the volume from type o != 0 the price from type s is saved in the specific contract month.

cbind(DT1,out[,1:3])

         Date     Price Type Contract.Month Volume          F          G  H
1: 2015-10-21 -1.207066    o              F     11 -0.5466319 -0.5644520 NA
2: 2015-10-22 -2.345698    o              G     12         NA -0.4771927 NA
3: 2015-10-23 -0.574740    o              H     12         NA         NA NA

Bind this data.frame and the result looks like what you are looking for from what I've got from your question.

Hope this was helpful to you.

Upvotes: 1

Related Questions