Reputation: 11
I am using Apache Spark Streaming using a TCP connector to receive data. I have a python application that connects to a sensor, and create a TCP server that waits connection from Apache Spark, and then, sends json data through this socket.
How can I manage to join many independent sensors sources to send data to the same receiver on Apache Spark?
Upvotes: 1
Views: 152
Reputation: 1305
It seems like you need Message-oriented Middleware (MOM) or a kafka cluster to handle real-time data feeds. Your message producers can send to a kafka topic and Spark streaming can receive from that kafka topic. That way you can decouple your producer and receiver. Kafka can scale linearly and using it with spark streaming kafka-direct stream approach with back-pressure can provide you good failover resiliency. If you choose another MOM, you can use the spark receiver based approach and union multiple streams to scale it up
Upvotes: 2