Reputation: 41
Hello I am working on implementing various technical indicators to better understand the algorithms and their implementations; I do not want to use zoo
or other pre-packaged algorithms.
I want to use data.table
.
Here is the data we are working with:
set.seed(123)
nrows <- 10000
dt <- data.table::data.table(
symbol = sample(LETTERS[1:2], 100, replace = TRUE),
close = runif(nrows, 0, 100),
open = runif(nrows, 0, 100),
high = runif(nrows, 0, 100),
low = runif(nrows, 0, 100),
volume = runif(nrows, 0, 100)
)
I can calculate the simple moving average (sma) very easily using data.table::frollmean
; this is simply the mean of the window:
# calculate simple moving average sma
dt[, sma_short := data.table::frollmean(close, n = 30L, algo = "exact"), by = symbol]
# another way to do the same thing:
dt[, sma_manual_calculation := data.table::frollapply(close, n = 30L, \(x) {
return(mean(x))
}), by = symbol]
identical(dt$sma_short, dt$sma_manual_calculation) # TRUE
The formula I have found for calculating the ema is as shown here: https://bookdown.org/kochiuyu/technical-analysis-with-r-second-edition2/exponential-moving-average-ema.html
If anyone has a different formula or this one shown is wrong please let me know and I would love an explanation - I seek to understand the algorithm and the maths behind
From what I've understood an exponential moving average is a type of moving average that gives more weight to recent observations.
beta = 2 / (n + 1) # the smoothing factor
ema_t(P, n) = beta * P_t + beta (1 - beta) * P_(t-1) + beta (1 - beta)^2 * P_(t-2) + ...
ema_t(P, n) = beta * P_t + (1 - beta) * ema_(t-1)(P, n)
This the formula I've found in a function from the previous link I mentioned above; I made some small modifications for efficiency:
myEMA <- function (price, n) {
# calculate the smoothing coefficient beta
beta <- 2 / (n + 1)
# pre-allocate the vector with NA values
ema <- rep(NA_real_, n - 1)
# calculate first value as the average of the sliding window
ema[n] <- mean(price[1:n])
for (i in (n + 1):length(price)){
ema[i] <- beta * price[i] + (1 - beta) * ema[i - 1]
}
return(as.list(ema))
}
My question is how would I accomplish this same thing with data.table
. I am certain this must be possible with data.table::frollapply
.
As always with R
I would like to stick first to using vectorised operations, avoid for loops (prefer apply
family of functions if necessary) and first I want to use data.table
.
What I seek is to implement the algorithm myself in the most computationally efficient way possible.
Upvotes: 0
Views: 802
Reputation: 41240
EMA is an IIR filter which you can calculate with the signal
package:
EMA <-function(x,n) signal::filter(signal::Arma(b = 2 / (n + 1), a = c(1,2 / (n + 1)-1)),x)
dt[,EMA:=EMA(close,100)]
dygraphs::dygraph(dt[,.(.I,close,EMA)])
This is twice slower than custom cpp code, but faster to program:
microbenchmark::microbenchmark(dt[, ema_short := ema(close, 30L), by = symbol],dt[, ema_short := EMA(close, 30L), by = symbol])
Unit: microseconds
expr min lq mean median uq max neval
dt[, `:=`(ema_short, ema(close, 30L)), by = symbol] 819.8 886.4 1153.392 925.45 1045.20 9245.9 100
dt[, `:=`(ema_short, EMA(close, 30L)), by = symbol] 1266.3 1683.8 2061.642 1793.80 1962.95 8979.6 100
Upvotes: 2
Reputation: 41
Thank you for the comments, indeed I realised that this cannot be done with data.table::frollapply
or other such functions since we need to access previous values etc as we go.
I decided to implement this algorithm in Cpp
and call this using Rcpp
:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector ema(NumericVector price, int n) {
// define beta
double beta = 2.0 / (n + 1.0);
// pre-allocate the vector with NA values
NumericVector ema(price.size(), NA_REAL);
// calculate the first value as the average of the first n values
// ema[n] = sum(price[Range(0, n - 1)]) / n;
ema[n - 1] = mean(price[Range(0, n - 1)]);
for (int i = n; i <= price.size(); i++) {
ema[i] = beta * price[i] + (1.0 - beta) * ema[i - 1];
}
return ema;
}
I then use this from my code with:
Rcpp::sourceCpp("./modules/ema.cpp")
dataset[, ema_short := ema(close, 30L), by = symbol]
I include here a benchmark:
Upvotes: 0