Reputation: 151
Given a stream of SparkFlumeEvents (or say, any DStream) how does one map to an appropriate schema so that the stream can be saved to Cassandra with
stream.saveToCassandra(keyspace,table)
A naive attempt complains about missing columns.
Is the best approach to stream.map() to given object (which seems cumbersome)?
OR...
another approach seems to be using stream.foreachRDD and somehow mapping to a dataframe. That also seems cumbersome given that the stream method supports saving to cassandra directly.
So what is the right way?
Upvotes: 0
Views: 89
Reputation: 880
Streams are saved into Cassandra using spark cassandra connector by specifying keyspace, tableName and columns to insert into. The other approach is by mapping data to UDTs and inserting that into database. I prefer specifying columns as its the fastest way if you just need to insert data. Example from documentation does exactly the same but you can use any variant of it:
val wc = stream.flatMap(_.split("\\s+"))
.map(x => (x, 1))
.reduceByKey(_ + _)
.saveToCassandra("streaming_test", "words", SomeColumns("word", "count"))
Upvotes: 0