anon
anon

Reputation:

How to do loop calculations in R?

My case is that I have a data file for each commodity, I write the data file like this into R:

price_data <- read.table("commodityA.txt", sep="\t", header=TRUE, dec=",")

Then I get the following table as a data frame:

TIME                  return 
...                   ...
2005-01-05 10:15:00   0.5  
2005-01-05 10:16:00   0.6  
2005-01-05 10:17:00   0.3 
2005-01-05 10:18:00   0.1
2005-01-05 10:19:00   0.5
2005-01-05 10:20:00   0.5  
2005-01-05 10:21:00   0.2  
2005-01-05 10:22:00   0.5 
2005-01-05 10:23:00   0.2
2005-01-05 10:24:00   0.5

Above is the data for one commodity. Additionally to that I have data frame that looks like this:

TIME                 Event Type                 
2004-12-15 12:45:00  A
2005-01-05 10:20:00  B
2005-10-31 11:05:00  C

I am trying to find a way to connect the events of Excel with the financial data. So for example, for the second event at 2005-01-05 10:20:00. This event will be t=0 with a return of 0.5. The minute before will be t=-1 while the minute after the event is t=1.

I want to compare the three given returns of the three t here with the expected return to check for abnormalities... Therefore I need a mean of return for the dates that were before. Let's take a timeframe with four minutes here, e.g.t=-5 to t=-2, so we have:

2005-01-05 10:15:00   0.5  
2005-01-05 10:16:00   0.6  
2005-01-05 10:17:00   0.3 
2005-01-05 10:18:00   0.1

Mean-Calculation: (0.5+0.6+0.3+0.1)/4 = 0.375. Then check for abnormalities for the three t

t=-1:  0.5 - 0.375 =  0.125
t= 0:  0.5 - 0.375 =  0.125
t= 1:  0.2 - 0.375 = -0.175

Then write the results to a data frame and later into excel with following structure, to get the following list in the end:

TIME                 Event Type  t=-1   t= 0  t= 1
2004-12-15 12:45:00  A           ...    ...   ...
2005-01-05 10:20:00  B           0.125  0.125 -0.175
2005-10-31 11:05:00  C           ...    ...   ...

Is there a possibility to create a loop or something to do the calculation for all TIME given in excel so that I have a complete list of Event time, type and abnormalities? I have more than 50 events in the excel list.

Every help is appreciated. Thank you!

Upvotes: 0

Views: 125

Answers (2)

Marek Fiołka
Marek Fiołka

Reputation: 4949

What you want to do is trivially simple.

  1. Read data from files as you write
library(tidyverse)
library(lubridate)

price_data = tribble(
 ~TIME, ~return,
 "2005-01-05 10:15:00", 0.5,
 "2005-01-05 10:16:00", 0.6,
 "2005-01-05 10:17:00", 0.3,
 "2005-01-05 10:18:00", 0.1,
 "2005-01-05 10:19:00", 0.5,
 "2005-01-05 10:20:00", 0.5,
 "2005-01-05 10:21:00", 0.2,
 "2005-01-05 10:22:00", 0.5,
 "2005-01-05 10:23:00", 0.2,
 "2005-01-05 10:24:00", 0.5
) %>% mutate(TIME = ymd_hms(TIME))

event_data = tribble(
  ~TIME,                 ~Event.Type,
  "2004-12-15 12:45:00", "A",
  "2005-01-05 10:20:00", "B",
  "2005-01-05 10:21:00", "C",
  "2005-01-05 10:23:00", "A",
  "2005-10-31 11:05:00", "C"
) %>% mutate(TIME = ymd_hms(TIME))
  1. Prepare a function that will do the calculations as you wrote it
f1 = function(event_time, price_data){
  out = tibble(`t-1` = NA, t0 = NA, `t+1`=NA)
  idx = which(price_data$TIME==event_time)
  if(length(idx)==0) return(out)
  if(idx<6) return(out)
  if(idx>(nrow(price_data)-1)) return(out) 
  mt2t5 = mean(price_data$return[(idx-5):(idx-2)])
  tibble(`t-1` = price_data$return[idx-1] - mt2t5, 
         t0 = price_data$return[idx] - mt2t5, 
         `t+1` = price_data$return[idx+1] - mt2t5) 
}
  1. Make a mutation
event_data %>% 
  mutate(data = map(TIME, f1, price_data)) %>% 
  unnest(data)

output

# A tibble: 5 x 5
  TIME                Event.Type  `t-1`     t0  `t+1`
  <dttm>              <chr>       <dbl>  <dbl>  <dbl>
1 2004-12-15 12:45:00 A          NA     NA     NA    
2 2005-01-05 10:20:00 B           0.125  0.125 -0.175
3 2005-01-05 10:21:00 C           0.125 -0.175  0.125
4 2005-01-05 10:23:00 A           0.175 -0.125  0.175    
5 2005-10-31 11:05:00 C          NA     NA     NA   

And it's ready!

However, do not miss the appropriate security features in the f1 function. These are the indices idx<6 and idx>(nrow(price_data)-1)

Update

Ok, let's try to modify our function f1 so that t1 and t2 are arguments that take whatever value you want. Here is the corrected code.

library(tidyverse)
library(lubridate)

price_data = tribble(
 ~TIME, ~return,
 "2005-01-05 10:10:00", 0.5,
 "2005-01-05 10:11:00", 0.6,
 "2005-01-05 10:12:00", 0.3,
 "2005-01-05 10:13:00", 0.1,
 "2005-01-05 10:14:00", 0.5,
 "2005-01-05 10:15:00", 0.5,
 "2005-01-05 10:16:00", 0.6,
 "2005-01-05 10:17:00", 0.3,
 "2005-01-05 10:18:00", 0.1,
 "2005-01-05 10:19:00", 0.5,
 "2005-01-05 10:20:00", 0.5,
 "2005-01-05 10:21:00", 0.2,
 "2005-01-05 10:22:00", 0.5,
 "2005-01-05 10:23:00", 0.2,
 "2005-01-05 10:24:00", 0.5
) %>% mutate(TIME = ymd_hms(TIME))

event_data = tribble(
  ~TIME,                 ~Event.Type,
  "2004-12-15 12:45:00", "A",
  "2005-01-05 10:20:00", "B",
  "2005-01-05 10:21:00", "C",
  "2005-01-05 10:23:00", "A",
  "2005-10-31 11:05:00", "C"
) %>% mutate(TIME = ymd_hms(TIME))


f1 = function(event_time, price_data, t1=2, t2=-2){
  out = tibble(`t-1` = NA, t0 = NA, `t+1`=NA)
  idx = which(price_data$TIME==event_time)
  if(length(idx)==0) return(out)
  if((idx+t1)<1 | (idx+t2)<1 | 
     (idx+t1)>nrow(price_data) | (idx+t2)>nrow(price_data) | 
     idx==(nrow(price_data)-1) | idx==1) return(out)
  mt1t2 = mean(price_data$return[(idx+t1):(idx+t2)])
  tibble(`t-1` = price_data$return[idx-1] - mt1t2,
         t0 = price_data$return[idx] - mt1t2,
         `t+1` = price_data$return[idx+1] - mt1t2)
}

event_data %>%
  mutate(data = map(TIME, f1, price_data, 2, -8)) %>%
  unnest(data)

ouyput

# A tibble: 5 x 5
  TIME                Event.Type  `t-1`     t0  `t+1`
  <dttm>              <chr>       <dbl>  <dbl>  <dbl>
1 2004-12-15 12:45:00 A          NA     NA     NA    
2 2005-01-05 10:20:00 B           0.127  0.127 -0.173
3 2005-01-05 10:21:00 C           0.136 -0.164  0.136
4 2005-01-05 10:23:00 A          NA     NA     NA    
5 2005-10-31 11:05:00 C          NA     NA     NA   

Finally, a few remarks and comments. When doing index manipulations, you must always be careful not to go beyond the permissible ranges of vector or data frame indices. In this case, we must make sure that the indices are always in the range 1: nrow (price_data). So we have to control the parameters t1 and t2 and if they cause going beyond the permissible indices, react accordingly. In this case the NA response seems to be appropriate (if((idx+t1)<1 | ...idx==1) return(out)).

Of course, the index cannot be a empty value, and this is what happens when the TIME value from the event_data tibble is not found in the price_data tibble (if(length(idx)==0) return(out)).

Update 2

f2 = function(event_time, price_data, t1=2, t2=-2){
  out = tibble(`t-2` = NA, `t-1` = NA, t0 = NA, `t+1`=NA, `t+2`=NA)
  idx = which(price_data$TIME==event_time)
  if(length(idx)==0) return(out)
  if((idx+t1)<1 | (idx+t2)<1 |
     (idx+t1)>nrow(price_data) | (idx+t2)>nrow(price_data) |
     idx==(nrow(price_data)-1) | idx==1 |
     idx==(nrow(price_data)-2) | idx==2) return(out)
  mt1t2 = mean(price_data$return[(idx+t1):(idx+t2)])
  tibble(`t-2` = price_data$return[idx-2] - mt1t2,
         `t-1` = price_data$return[idx-1] - mt1t2,
         t0 = price_data$return[idx] - mt1t2,
         `t+1` = price_data$return[idx+1] - mt1t2,
         `t+2` = price_data$return[idx+2] - mt1t2)
}

event_data %>%
  mutate(data = map(TIME, f2, price_data, 2, -8)) %>%
  unnest(data)

output

# A tibble: 5 x 7
  TIME                Event.Type  `t-2`  `t-1`     t0  `t+1`  `t+2`
  <dttm>              <chr>       <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 2004-12-15 12:45:00 A          NA     NA     NA     NA     NA    
2 2005-01-05 10:20:00 B          -0.273  0.127  0.127 -0.173  0.127
3 2005-01-05 10:21:00 C           0.136  0.136 -0.164  0.136 -0.164
4 2005-01-05 10:23:00 A          NA     NA     NA     NA     NA    
5 2005-10-31 11:05:00 C          NA     NA     NA     NA     NA   

Upvotes: 2

Julian
Julian

Reputation: 11

Assuming you have the data in two data frames price_data and event_data with both a column named TIME used to merge, this should do the trick

all_data <- merge(price_data, event_data, all=TRUE)
all_data <- cbind(all_data,"t-1"=c(NA,all_data[,2][-nrow(all_data)]),"t"=all_data[,2],"t+1"=c(all_data[,2][-1],NA))
all_data[,2] <- round(rowMeans(all_data[,4:6]),2)
all_data[,4:6] <- all_data[,4:6]-all_data[,2]

Upvotes: 0

Related Questions