Srdjan Nikitovic
Srdjan Nikitovic

Reputation: 853

Time series forecasting in Spark & Spark Streaming

I am quite new to machine learning, so I need some help.

I have spark streaming job which ingests data about user electricity consumption into Cassandra. I fill multiple tables with that data, out of which is most important "hourly_data", which specifies how much electricity each user spent within specific hour.

What I want to do, is some forecasting about how much electricity user will spend until the end of the day, month or year.

Which libraries and models I should use for that? Is the regression what I actually need?

I guess I cannot do forecasting in streaming job, but I need to start a batch process for that?

Also, it would be nice if I could for a specific day, plot the expected user behaviour until the end of the day (same for the month or a year...) Which libraries in Spark can help me do that? Any tutorials?

Thanks a lot

Upvotes: 4

Views: 9175

Answers (1)

None
None

Reputation: 1468

In order to forecast for a day, month and a year, you need to profile your time series accordingly. For example if you want to predict usage for the day. You need to aggregate the hourly data by day. Input data:

date       | hour | consumption|
--------------------------------
2016-05-07 | 01   | 0.3        |
2016-05-07 | 02   | 0.3        |
2016-05-07 | 03   | 0.3        |
2016-05-08 | :    | 0.3        |
2016-05-08 | :    | 0.3        |
2016-05-09 | 20   | 0.4        |
2016-05-09 | 21   | 0.1        |
2016-05-09 | 22   | 0.2        |
2016-05-09 | 23   | 0.3        |
2016-05-09 | 24   | 0.3        |

Your profile series should be

date       | consumption|
--------------------------------
2016-05-07 | 1          |
2016-05-08 | 1.3        |
2016-05-09 | 2.3        |

Also if you have missing data you have to account for that. Once you profiled your data you can try different models like ARIMA, Holt-Winters and also you could try some statespace models. As far as libraries spark-timeseries has ARIMA implementation.

Upvotes: 1

Related Questions