ismisesisko
ismisesisko

Reputation: 151

Spark Streaming Schema

Given a stream of SparkFlumeEvents (or say, any DStream) how does one map to an appropriate schema so that the stream can be saved to Cassandra with

stream.saveToCassandra(keyspace,table)

A naive attempt complains about missing columns.

Is the best approach to stream.map() to given object (which seems cumbersome)?

OR...

another approach seems to be using stream.foreachRDD and somehow mapping to a dataframe. That also seems cumbersome given that the stream method supports saving to cassandra directly.

So what is the right way?

Upvotes: 0

Views: 89

Answers (1)

Matija Gobec
Matija Gobec

Reputation: 880

Streams are saved into Cassandra using spark cassandra connector by specifying keyspace, tableName and columns to insert into. The other approach is by mapping data to UDTs and inserting that into database. I prefer specifying columns as its the fastest way if you just need to insert data. Example from documentation does exactly the same but you can use any variant of it:

val wc = stream.flatMap(_.split("\\s+"))
    .map(x => (x, 1))
    .reduceByKey(_ + _)
    .saveToCassandra("streaming_test", "words", SomeColumns("word", "count")) 

Upvotes: 0

Related Questions