Reputation: 777
I'm having a look to this spark sample:
https://spark.apache.org/docs/latest/streaming-programming-guide.html#a-quick-example
I would like to implement an "echo" application, a spark streaming software that read some character from a socket and outputs the same characters on the same socket (this is a simplification of my real problem, of course i have some processing to do on the input, but let's not care of this for now).
I tried to implement a CustomReceiver following this guide: https://spark.apache.org/docs/2.3.0/streaming-custom-receivers.html
I added a method getSocket
:
def getSocket() : Socket = {
socket
}
And i tried to invoke it like this:
val receiver = new SocketReceiver2("localhost", 9999, StorageLevel.MEMORY_AND_DISK_2)
val lines = ssc.receiverStream(receiver)
lines.foreachRDD {
rdd => rdd.foreachPartition { partitionOfRecords =>
val os = receiver.getSocket().getOutputStream();
partitionOfRecords.foreach(record => os.write(record.getBytes()))
}
}
But i get a Task not serializable
error. (as T.Gaweda pointed out this is to be expected). So the next step would be start to use accumulators...
Is there any simpler way to do my "echo" application in Spark Streaming?
(Do i really need to use Kafka (hdfs, hive...) to send data back and forth from a simple java app?)
Upvotes: 1
Views: 198
Reputation: 16076
Receivers are sent to workers, where they are executed. You can see in the Receiver class, that it implements Serializable.
In your code, you have socket
field, probably of type Socket, which is not serializable
Upvotes: 1