dereckmezquita
dereckmezquita

Reputation: 41

Implement EMA (exponential moving average) in R + data.table

Hello I am working on implementing various technical indicators to better understand the algorithms and their implementations; I do not want to use zoo or other pre-packaged algorithms.

I want to use data.table.

sample data

Here is the data we are working with:

set.seed(123)
nrows <- 10000
dt <- data.table::data.table(
    symbol = sample(LETTERS[1:2], 100, replace = TRUE),
    close = runif(nrows, 0, 100),
    open = runif(nrows, 0, 100),
    high = runif(nrows, 0, 100),
    low = runif(nrows, 0, 100),
    volume = runif(nrows, 0, 100)
)

sma (simple moving average)

I can calculate the simple moving average (sma) very easily using data.table::frollmean; this is simply the mean of the window:

# calculate simple moving average sma
dt[, sma_short := data.table::frollmean(close, n = 30L, algo = "exact"), by = symbol]

# another way to do the same thing:
dt[, sma_manual_calculation := data.table::frollapply(close, n = 30L, \(x) {
    return(mean(x))
}), by = symbol]

identical(dt$sma_short, dt$sma_manual_calculation) # TRUE

ema (exponential moving average)

The formula I have found for calculating the ema is as shown here: https://bookdown.org/kochiuyu/technical-analysis-with-r-second-edition2/exponential-moving-average-ema.html

If anyone has a different formula or this one shown is wrong please let me know and I would love an explanation - I seek to understand the algorithm and the maths behind

From what I've understood an exponential moving average is a type of moving average that gives more weight to recent observations.

beta = 2 / (n + 1) # the smoothing factor

ema_t(P, n) = beta * P_t + beta (1 - beta) * P_(t-1) + beta (1 - beta)^2 * P_(t-2) + ...

ema_t(P, n) = beta * P_t + (1 - beta) * ema_(t-1)(P, n)

This the formula I've found in a function from the previous link I mentioned above; I made some small modifications for efficiency:

myEMA <- function (price, n) {
    # calculate the smoothing coefficient beta
    beta <- 2 / (n + 1)

    # pre-allocate the vector with NA values
    ema <- rep(NA_real_, n - 1)

    # calculate first value as the average of the sliding window
    ema[n] <- mean(price[1:n])

    for (i in (n + 1):length(price)){
        ema[i] <- beta * price[i] + (1 - beta) * ema[i - 1]
    }

    return(as.list(ema))
}

question

My question is how would I accomplish this same thing with data.table. I am certain this must be possible with data.table::frollapply.

As always with R I would like to stick first to using vectorised operations, avoid for loops (prefer apply family of functions if necessary) and first I want to use data.table.

What I seek is to implement the algorithm myself in the most computationally efficient way possible.

Upvotes: 0

Views: 802

Answers (2)

Waldi
Waldi

Reputation: 41240

EMA is an IIR filter which you can calculate with the signal package:

EMA <-function(x,n)  signal::filter(signal::Arma(b = 2 / (n + 1), a =  c(1,2 / (n + 1)-1)),x)

dt[,EMA:=EMA(close,100)]


dygraphs::dygraph(dt[,.(.I,close,EMA)])

enter image description here

This is twice slower than custom cpp code, but faster to program:

microbenchmark::microbenchmark(dt[, ema_short := ema(close, 30L), by = symbol],dt[, ema_short := EMA(close, 30L), by = symbol])
Unit: microseconds
                                                expr    min     lq     mean  median      uq    max neval
 dt[, `:=`(ema_short, ema(close, 30L)), by = symbol]  819.8  886.4 1153.392  925.45 1045.20 9245.9   100
 dt[, `:=`(ema_short, EMA(close, 30L)), by = symbol] 1266.3 1683.8 2061.642 1793.80 1962.95 8979.6   100

Upvotes: 2

dereckmezquita
dereckmezquita

Reputation: 41

Thank you for the comments, indeed I realised that this cannot be done with data.table::frollapply or other such functions since we need to access previous values etc as we go.

I decided to implement this algorithm in Cpp and call this using Rcpp:

#include <Rcpp.h>

using namespace Rcpp;

// [[Rcpp::export]]
NumericVector ema(NumericVector price, int n) {
    // define beta
    double beta = 2.0 / (n + 1.0);

    // pre-allocate the vector with NA values
    NumericVector ema(price.size(), NA_REAL);

    // calculate the first value as the average of the first n values
    // ema[n] = sum(price[Range(0, n - 1)]) / n;
    ema[n - 1] = mean(price[Range(0, n - 1)]);

    for (int i = n; i <= price.size(); i++) {
        ema[i] = beta * price[i] + (1.0 - beta) * ema[i - 1];
    }
    
    return ema;
}

I then use this from my code with:

Rcpp::sourceCpp("./modules/ema.cpp")

dataset[, ema_short := ema(close, 30L), by = symbol]

benchmark

I include here a benchmark:

ema R vs ema Cpp microbenchmark

Upvotes: 0

Related Questions