Reputation: 1423
I am using Spark-streaming to receive data from a zero MQ Queue at an specific interval , enrich it and save it as parquet files . I want to compare data from one streaming window to another.(later in time using parquet files)
How can I find the timestamps a specific streaming window , which I can add as another filed while enrichment to facilitate my comparisons.
JavaStreamingContext javaStreamingContext = new JavaStreamingContext(sparkConf, new Duration(duration));
inputStream = javaStreamingContext.receiverStream(new StreamReceiver( hostName, port, StorageLevel.MEMORY_AND_DISK_SER()));
JavaDStream<myPojoFormat> enrichedData = inputStream.map(new Enricher());
In a nutshell I want time stamp of each streaming window .( Not record level but batch level)
Upvotes: 2
Views: 905
Reputation: 3055
You can use the transform
method of JavaDStream
which gets a Function2
s parameter. The Function2
gets a RDD
and a Time
object and returns a new RDD. The overall result will be a new JavaDStream
in which RDD
have been trasformed accord the logic you have chosen.
Upvotes: 4