Frying Pan
Frying Pan

Reputation: 407

py4j.protocol.Py4JNetworkError: Answer from Java side is empty

This is the code I am using on Google Colab. It keeps getting stuck at the model.fit part and throws this exception. I haven't been able to find any solutions for it anywhere. The memory also seems to get very high on Colab, starting to think there's a memory leak in the spark nlp library.

import sparknlp
spark = sparknlp.start()

data = spark.read.csv("60days-ofdata.csv", header=True)

from sparknlp.pretrained import PretrainedPipeline
from sparknlp import Finisher
from pyspark.ml import Pipeline

finisher = Finisher().setInputCols(["token", "lemmas", "pos"])
explain_pipeline_model = PretrainedPipeline("explain_document_ml").model

pipeline = Pipeline() \
    .setStages([
        explain_pipeline_model,
        finisher
        ])

model = pipeline.fit(data.select('text'))
annotations_finished_df = model.transform(data.select('text'))

remover = StopWordsRemover(inputCol="finished_lemmas", outputCol="filtered")
filtered_df = remover.transform(text_lemmas)
filtered_df.show()

cv = CountVectorizer(inputCol="filtered", outputCol="features")
model = cv.fit(filtered_df.select('filtered')) <--------------------------------error thrown while here
result = model.transform(filtered_df.select('filtered'))

Error:

INFO:py4j.java_gateway:Error while receiving.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/py4j/java_gateway.py", line 1207, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
ERROR:root:Exception while sending command.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/py4j/java_gateway.py", line 1207, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/py4j/java_gateway.py", line 1033, in send_command
    response = connection.send_command(command)
  File "/usr/local/lib/python3.7/dist-packages/py4j/java_gateway.py", line 1212, in send_command
    "Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
---------------------------------------------------------------------------
Py4JError                                 Traceback (most recent call last)
<ipython-input-8-0caf2f9be8f3> in <module>()
      5 
      6 cv = CountVectorizer(inputCol="filtered", outputCol="features")
----> 7 model = cv.fit(filtered_df.select('filtered'))
      8 result = model.transform(filtered_df.select('filtered'))
      9 result.show()

5 frames
/usr/local/lib/python3.7/dist-packages/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
    334             raise Py4JError(
    335                 "An error occurred while calling {0}{1}{2}".
--> 336                 format(target_id, ".", name))
    337     else:
    338         type = answer[1]

Py4JError: An error occurred while calling o538.fit

Upvotes: 0

Views: 9093

Answers (1)

AlbertoAndreotti
AlbertoAndreotti

Reputation: 510

mck has provided a good answer, I will add that for solving this starting with spark-nlp 3.0.0 and later, you can pass a memory parameter to the start() function,

import sparknlp
spark = sparknlp.start(memory="16G")

to get 16GB of RAM memory in the driver. That may solve the problem.

Upvotes: 1

Related Questions