Reputation: 223
I have my scala code for anomaly detection on the KDD cup dataset. The code is at https://github.com/prashantprakash/KDDDataResearch/blob/master/Code/approach1Plus2/src/main/scala/PCA.scala
I wanted to try a new technique by using StreamingKMeans algorithm from MlLib and update my StreamingKmeans model whenever line 288 in the above code is true "if( dist < threshold ) {"; ie when the test point is classified as normal, update the KMeans model with the new "normal datapoint".
I see that StreamingKmeans take data in the form of DStreams. "Please help in converting the existing RDD to Dstreams."
I found a link http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-DStream-td11145.html but it didn't help much.
Also please advice if there is a better design to solve the problem.
Upvotes: 0
Views: 1951
Reputation: 581
As far as I know, an RDD cannot be converted into a DStream because an RDD is a collection of data, while a DStream is a concept referring to incoming data.
If you want to use StreamingKMeans, take the data that you formed into an RDD, and instead convert it to a DStream, possibly using KafkaUtils.createDirectStream
or ssc.textFileStream
.
Hope this helps!
Upvotes: 1