Spark Streaming with Python: Joining two stream with respect to a particular attribute

Question

I am receiving two socket streams S1 and S2 with schemas S1 and S2 respectively.

I would like to join S1 and S2 with respect to attribute "a" using spark streaming. Following is my code:

    sc = SparkContext("local[3]", "StreamJoin")
    ssc = StreamingContext(sc, 1) 

    S1 = ssc.socketTextStream("localhost", 9999)
    S2 = ssc.socketTextStream("localhost", 8085)

    # Create windowed stream
    wS1 = S1.window(10)
    wS2 = S2.window(1)

    wS1.flatMap(lambda line: line.split(",")).pprint()
    wS2.flatMap(lambda line: line.split(",")).pprint()

    # Perform join
    joinedStream = wS1.join(wS2)

    joinedStream.foreachRDD(lambda rdd: rdd.foreach(lambda x: print(x)))

    ssc.start()             
    ssc.awaitTermination()

Both S1 and S2 are comma separated.

Although the above code performs join, however with respect to complete row.

I am interested to join the two streams with respect to a particular attribute, in this case attribute 'a'. How I can achieve this?

Thanks a lot!

Spark Streaming with Python: Joining two stream with respect to a particular attribute

Answers (1)

Related Questions