Reputation: 853
I am quite new to machine learning, so I need some help.
I have spark streaming job which ingests data about user electricity consumption into Cassandra. I fill multiple tables with that data, out of which is most important "hourly_data", which specifies how much electricity each user spent within specific hour.
What I want to do, is some forecasting about how much electricity user will spend until the end of the day, month or year.
Which libraries and models I should use for that? Is the regression what I actually need?
I guess I cannot do forecasting in streaming job, but I need to start a batch process for that?
Also, it would be nice if I could for a specific day, plot the expected user behaviour until the end of the day (same for the month or a year...) Which libraries in Spark can help me do that? Any tutorials?
Thanks a lot
Upvotes: 4
Views: 9175
Reputation: 1468
In order to forecast for a day, month and a year, you need to profile your time series accordingly. For example if you want to predict usage for the day. You need to aggregate the hourly data by day. Input data:
date | hour | consumption|
--------------------------------
2016-05-07 | 01 | 0.3 |
2016-05-07 | 02 | 0.3 |
2016-05-07 | 03 | 0.3 |
2016-05-08 | : | 0.3 |
2016-05-08 | : | 0.3 |
2016-05-09 | 20 | 0.4 |
2016-05-09 | 21 | 0.1 |
2016-05-09 | 22 | 0.2 |
2016-05-09 | 23 | 0.3 |
2016-05-09 | 24 | 0.3 |
Your profile series should be
date | consumption|
--------------------------------
2016-05-07 | 1 |
2016-05-08 | 1.3 |
2016-05-09 | 2.3 |
Also if you have missing data you have to account for that. Once you profiled your data you can try different models like ARIMA, Holt-Winters and also you could try some statespace models. As far as libraries spark-timeseries has ARIMA implementation.
Upvotes: 1