Reputation: 1069
I have aggregated data 1 row for 1 day. I want to split the data into 24 X 1 hr data.
Input
1 24
output
1 1
1 2
1 3
1 4
1 5
1 6
1 7
1 8
1 9
1 10
...
1 24
Upvotes: 1
Views: 46
Reputation: 8996
Say you time series is in (day,value) pairs:
(1,10)
(2,5)
(3,4)
...
And you want to convert them into (hour,value) pairs where the value remains the same for all pairs in the same day.
(1,10)
(2,10)
(3,10)
...
(24,10)
(25,5)
...
(48,5)
(49,4)
...
(72,4)
...
Here is how to do this in basic Scala:
val timeSeries = Seq(1->10, 2->5, 3->4)
timeSeries.flatMap{ case(day,value) =>
((1 to 24)).map( h => ( (h+(day-1)*24),value))
}
Here is how to do this on Spark:
val rddTimeSeries = sc.makeRDD(timeSeries)
// Very similar with what we do in Scala
val perHourTs = rddTimeSeries.flatMap{ case(day,value) =>
((1 to 24)).map( hour => ( (hour + (day-1)*24 ), value))
}
// We can print it given that we know the list is small
println(perHourTs.collect().toList)
One complication with Spark is that data may come out of order which can mess up the order in your time series. In order to address it, the simplest way will be to sort your data before you call an action on your RDD.
// Here is how to sort your time series
perHourTs.sortBy(_._1).collect()
Upvotes: 2