Reputation:
My case is that I have a data file for each commodity, I write the data file like this into R:
price_data <- read.table("commodityA.txt", sep="\t", header=TRUE, dec=",")
Then I get the following table as a data frame:
TIME return
... ...
2005-01-05 10:15:00 0.5
2005-01-05 10:16:00 0.6
2005-01-05 10:17:00 0.3
2005-01-05 10:18:00 0.1
2005-01-05 10:19:00 0.5
2005-01-05 10:20:00 0.5
2005-01-05 10:21:00 0.2
2005-01-05 10:22:00 0.5
2005-01-05 10:23:00 0.2
2005-01-05 10:24:00 0.5
Above is the data for one commodity. Additionally to that I have data frame that looks like this:
TIME Event Type
2004-12-15 12:45:00 A
2005-01-05 10:20:00 B
2005-10-31 11:05:00 C
I am trying to find a way to connect the events of Excel with the financial data.
So for example, for the second event at 2005-01-05 10:20:00
.
This event will be t=0
with a return of 0.5
. The minute before will be t=-1
while the minute after the event is t=1
.
I want to compare the three given returns of the three t
here with the expected return to check for abnormalities... Therefore I need a mean of return for the dates that were before.
Let's take a timeframe with four minutes here, e.g.t=-5 to t=-2
, so we have:
2005-01-05 10:15:00 0.5
2005-01-05 10:16:00 0.6
2005-01-05 10:17:00 0.3
2005-01-05 10:18:00 0.1
Mean-Calculation: (0.5+0.6+0.3+0.1)/4 = 0.375
.
Then check for abnormalities for the three t
t=-1: 0.5 - 0.375 = 0.125
t= 0: 0.5 - 0.375 = 0.125
t= 1: 0.2 - 0.375 = -0.175
Then write the results to a data frame and later into excel with following structure, to get the following list in the end:
TIME Event Type t=-1 t= 0 t= 1
2004-12-15 12:45:00 A ... ... ...
2005-01-05 10:20:00 B 0.125 0.125 -0.175
2005-10-31 11:05:00 C ... ... ...
Is there a possibility to create a loop or something to do the calculation for all TIME
given in excel so that I have a complete list of Event time, type and abnormalities? I have more than 50 events in the excel list.
Every help is appreciated. Thank you!
Upvotes: 0
Views: 125
Reputation: 4949
What you want to do is trivially simple.
library(tidyverse)
library(lubridate)
price_data = tribble(
~TIME, ~return,
"2005-01-05 10:15:00", 0.5,
"2005-01-05 10:16:00", 0.6,
"2005-01-05 10:17:00", 0.3,
"2005-01-05 10:18:00", 0.1,
"2005-01-05 10:19:00", 0.5,
"2005-01-05 10:20:00", 0.5,
"2005-01-05 10:21:00", 0.2,
"2005-01-05 10:22:00", 0.5,
"2005-01-05 10:23:00", 0.2,
"2005-01-05 10:24:00", 0.5
) %>% mutate(TIME = ymd_hms(TIME))
event_data = tribble(
~TIME, ~Event.Type,
"2004-12-15 12:45:00", "A",
"2005-01-05 10:20:00", "B",
"2005-01-05 10:21:00", "C",
"2005-01-05 10:23:00", "A",
"2005-10-31 11:05:00", "C"
) %>% mutate(TIME = ymd_hms(TIME))
f1 = function(event_time, price_data){
out = tibble(`t-1` = NA, t0 = NA, `t+1`=NA)
idx = which(price_data$TIME==event_time)
if(length(idx)==0) return(out)
if(idx<6) return(out)
if(idx>(nrow(price_data)-1)) return(out)
mt2t5 = mean(price_data$return[(idx-5):(idx-2)])
tibble(`t-1` = price_data$return[idx-1] - mt2t5,
t0 = price_data$return[idx] - mt2t5,
`t+1` = price_data$return[idx+1] - mt2t5)
}
event_data %>%
mutate(data = map(TIME, f1, price_data)) %>%
unnest(data)
output
# A tibble: 5 x 5
TIME Event.Type `t-1` t0 `t+1`
<dttm> <chr> <dbl> <dbl> <dbl>
1 2004-12-15 12:45:00 A NA NA NA
2 2005-01-05 10:20:00 B 0.125 0.125 -0.175
3 2005-01-05 10:21:00 C 0.125 -0.175 0.125
4 2005-01-05 10:23:00 A 0.175 -0.125 0.175
5 2005-10-31 11:05:00 C NA NA NA
And it's ready!
However, do not miss the appropriate security features in the f1
function. These are the indices idx<6
and idx>(nrow(price_data)-1)
Update
Ok, let's try to modify our function f1
so that t1
and t2
are arguments that take whatever value you want.
Here is the corrected code.
library(tidyverse)
library(lubridate)
price_data = tribble(
~TIME, ~return,
"2005-01-05 10:10:00", 0.5,
"2005-01-05 10:11:00", 0.6,
"2005-01-05 10:12:00", 0.3,
"2005-01-05 10:13:00", 0.1,
"2005-01-05 10:14:00", 0.5,
"2005-01-05 10:15:00", 0.5,
"2005-01-05 10:16:00", 0.6,
"2005-01-05 10:17:00", 0.3,
"2005-01-05 10:18:00", 0.1,
"2005-01-05 10:19:00", 0.5,
"2005-01-05 10:20:00", 0.5,
"2005-01-05 10:21:00", 0.2,
"2005-01-05 10:22:00", 0.5,
"2005-01-05 10:23:00", 0.2,
"2005-01-05 10:24:00", 0.5
) %>% mutate(TIME = ymd_hms(TIME))
event_data = tribble(
~TIME, ~Event.Type,
"2004-12-15 12:45:00", "A",
"2005-01-05 10:20:00", "B",
"2005-01-05 10:21:00", "C",
"2005-01-05 10:23:00", "A",
"2005-10-31 11:05:00", "C"
) %>% mutate(TIME = ymd_hms(TIME))
f1 = function(event_time, price_data, t1=2, t2=-2){
out = tibble(`t-1` = NA, t0 = NA, `t+1`=NA)
idx = which(price_data$TIME==event_time)
if(length(idx)==0) return(out)
if((idx+t1)<1 | (idx+t2)<1 |
(idx+t1)>nrow(price_data) | (idx+t2)>nrow(price_data) |
idx==(nrow(price_data)-1) | idx==1) return(out)
mt1t2 = mean(price_data$return[(idx+t1):(idx+t2)])
tibble(`t-1` = price_data$return[idx-1] - mt1t2,
t0 = price_data$return[idx] - mt1t2,
`t+1` = price_data$return[idx+1] - mt1t2)
}
event_data %>%
mutate(data = map(TIME, f1, price_data, 2, -8)) %>%
unnest(data)
ouyput
# A tibble: 5 x 5
TIME Event.Type `t-1` t0 `t+1`
<dttm> <chr> <dbl> <dbl> <dbl>
1 2004-12-15 12:45:00 A NA NA NA
2 2005-01-05 10:20:00 B 0.127 0.127 -0.173
3 2005-01-05 10:21:00 C 0.136 -0.164 0.136
4 2005-01-05 10:23:00 A NA NA NA
5 2005-10-31 11:05:00 C NA NA NA
Finally, a few remarks and comments.
When doing index manipulations, you must always be careful not to go beyond the permissible ranges of vector or data frame indices. In this case, we must make sure that the indices are always in the range 1: nrow (price_data)
.
So we have to control the parameters t1
and t2
and if they cause going beyond the permissible indices, react accordingly. In this case the NA
response seems to be appropriate (if((idx+t1)<1 | ...idx==1) return(out)
).
Of course, the index cannot be a empty value, and this is what happens when the TIME
value from the event_data
tibble is not found in the price_data
tibble (if(length(idx)==0) return(out)
).
Update 2
f2 = function(event_time, price_data, t1=2, t2=-2){
out = tibble(`t-2` = NA, `t-1` = NA, t0 = NA, `t+1`=NA, `t+2`=NA)
idx = which(price_data$TIME==event_time)
if(length(idx)==0) return(out)
if((idx+t1)<1 | (idx+t2)<1 |
(idx+t1)>nrow(price_data) | (idx+t2)>nrow(price_data) |
idx==(nrow(price_data)-1) | idx==1 |
idx==(nrow(price_data)-2) | idx==2) return(out)
mt1t2 = mean(price_data$return[(idx+t1):(idx+t2)])
tibble(`t-2` = price_data$return[idx-2] - mt1t2,
`t-1` = price_data$return[idx-1] - mt1t2,
t0 = price_data$return[idx] - mt1t2,
`t+1` = price_data$return[idx+1] - mt1t2,
`t+2` = price_data$return[idx+2] - mt1t2)
}
event_data %>%
mutate(data = map(TIME, f2, price_data, 2, -8)) %>%
unnest(data)
output
# A tibble: 5 x 7
TIME Event.Type `t-2` `t-1` t0 `t+1` `t+2`
<dttm> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2004-12-15 12:45:00 A NA NA NA NA NA
2 2005-01-05 10:20:00 B -0.273 0.127 0.127 -0.173 0.127
3 2005-01-05 10:21:00 C 0.136 0.136 -0.164 0.136 -0.164
4 2005-01-05 10:23:00 A NA NA NA NA NA
5 2005-10-31 11:05:00 C NA NA NA NA NA
Upvotes: 2
Reputation: 11
Assuming you have the data in two data frames price_data and event_data with both a column named TIME used to merge, this should do the trick
all_data <- merge(price_data, event_data, all=TRUE)
all_data <- cbind(all_data,"t-1"=c(NA,all_data[,2][-nrow(all_data)]),"t"=all_data[,2],"t+1"=c(all_data[,2][-1],NA))
all_data[,2] <- round(rowMeans(all_data[,4:6]),2)
all_data[,4:6] <- all_data[,4:6]-all_data[,2]
Upvotes: 0