pppnnn
pppnnn

Reputation: 277

Apache Spark streaming mapping object and printing attribute

I'm reading from a text file, parsing each line to JSON and am attempting to print one of the attributes:

val msgData = ssc.textFileStream(dataDir)

val msgs = msgData.map(MessageParser.parse)
msgs.foreach(msg => println(msg.my_attribute))

However, I get the following error on compilation:

value my_attribute is not a member of org.apache.spark.rdd.RDD[com.imgzine.analytics.messages.Message]

What am I missing?

Thanks

Upvotes: 1

Views: 2157

Answers (1)

maasg
maasg

Reputation: 37435

Spark Streaming discretizes a stream of data by creating micro-batch containers. Those are called 'DStreams' and contain a collection of RDD's.

Translated to your case, you need to operate on the content of the RDD, not the DStream:

msgs.foreach(rdd => rdd.foreach(elem => println(elem.my_attribute))

DStreams offer a help method to print the first elements (10 I think) of each RDD:

dstream.print()

Of course, that will just invoke .toString on the objects contained in the RDD and print the result. Maybe not what you want with my_attribute as stated in the question.

Upvotes: 3

Related Questions