Anonymous
Anonymous

Reputation: 29

Scala: Splitting the data coming from kafka vi a DStream

I am receiving the data from kafka in the form of

{"email":"test@example","firstname":"Example","lastname":"User"}

I want to access the email id and first name and want to compare it with data coming from cassandra in the form of :

CassandraRow{email: [email protected]}

Upvotes: -1

Views: 65

Answers (1)

Alex Ott
Alex Ott

Reputation: 87299

You need to perform join with Cassandra using the joinWithCassandraTable function...

To be more effective, you may need to re-partition your RDD that you get from Kafka to match partitions inside Cassandra's table. The code could look like this:

val resultRdd = kafkaRDD.repartitionByCassandraReplica("ks","emails")
   .joinWithCassandraTable("ks","emails")

And after that you can analyze, if names matches, etc. And after join you should get only records for which there are emails in the Cassandra...

Upvotes: 0

Related Questions