Time Series calculations in spark

Question

I am pretty new to spark and would like some advise on how to approach the following problem.

I have candle data (high, low, open, close) for ever minute of a trading day spread across a year. This represents about 360,000 data points.

What I want to do is run some simulations across that data (and possibly every data point) and what I would like is for a given data point, get the previous (or next) x data points and then run some code across that to give a result.

Ideally, this would be in a map style function but you cannot do a nested operation in Spark. The only way that I can think about doing it is to create a DataSet of the Candle as a key and have the related data un-normalised or partitioning it on every key - either way seems inefficient.

Ideally I am looking for something that does (Candle, List) -> Double or something similar.

I am sure there is a better approach.

I am using Spark 2.1.0 and using Yarn as the scheduling engine.

Time Series calculations in spark

Answers (1)

Related Questions