Abhishek Choudhary
Abhishek Choudhary

Reputation: 8405

Exception: could not open socket on pyspark

Whenever I am trying to execute a simple processing in pyspark, it fails to open the socket.

>>> myRDD = sc.parallelize(range(6), 3)
>>> sc.runJob(myRDD, lambda part: [x * x for x in part])

Above throws exception -

port 53554 , proto 6 , sa ('127.0.0.1', 53554)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Volumes/work/bigdata/spark-custom/python/pyspark/context.py", line 917, in runJob
    return list(_load_from_socket(port, mappedRDD._jrdd_deserializer))
  File "/Volumes/work/bigdata/spark-custom/python/pyspark/rdd.py", line 143, in _load_from_socket
    raise Exception("could not open socket")
Exception: could not open socket

>>> 15/08/30 19:03:05 ERROR PythonRDD: Error while sending iterator
java.net.SocketTimeoutException: Accept timed out
    at java.net.PlainSocketImpl.socketAccept(Native Method)
    at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:404)
    at java.net.ServerSocket.implAccept(ServerSocket.java:545)
    at java.net.ServerSocket.accept(ServerSocket.java:513)
    at org.apache.spark.api.python.PythonRDD$$anon$2.run(PythonRDD.scala:613)

I checked through rdd.py _load_from_socket and realised it gets the port , but the server is not even started or sp runJob might be the issue-

port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, partitions)

Upvotes: 5

Views: 8104

Answers (3)

Simon Zhang
Simon Zhang

Reputation: 11

Finally, I solved my problem.

when I started pyspark, I suddenly realized there was a warning which might have a connection with the issue.

WARN Utils:66 - Your hostname, localhost resolves to a loopback address: 127.0.0.1; using 172.16.20.244 instead (on interface en0) 2020-09-27 17:26:10 WARN Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address

Then I made a change of /etc/hosts, commenting 127.0.0.1 and adding a new line to solve the loopback problem, like this,

#127.0.0.1  localhost
#255.255.255.255    broadcasthost
#:::1             localhost
172.16.20.244 localhost

It worked.

I hope it could help those have a lot of pains solving this problem with the similar warnings.

Upvotes: 1

omarc7
omarc7

Reputation: 332

I was having the exact same error, tried JDK 1.7 and it didn't work, then i went and edited the /etc/hosts file and realized i had the following lines

127.0.0.1 mbp.local localhost
127.0.0.1 localhost

Just commented out the line with my computer local name and it worked.

#127.0.0.1 mbp.local localhost
127.0.0.1 localhost

Tested on PySpark 1.6.3 and 2.0.2 with JDK 1.8

Upvotes: 2

Abhishek Choudhary
Abhishek Choudhary

Reputation: 8405

Its not the ideal solution but now I am aware of the cause. Pyspark is unable to create jvm socket with JDK 1.8 (64-bit) version, so I just set my java path to jdk 1.7 and it worked.

Upvotes: 4

Related Questions