Diego-MX
Diego-MX

Reputation: 2339

Error when writing dataframe with PySpark

I am not able to save a table to any of a few different sources.
I have tried the following:

So, my deduction is that it's not even getting to writing it in any form, but I can't figure out how to know more about it.
The error logs are if not the same, very similar. The one I found most odd regards a module named "src" that is not found. This is what I found most repetitive and pertinent:

/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/
lib/spark2/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in 
get_return_value(answer, gateway_client, target_id, name) 
    326 raise Py4JJavaError( 
    327 "An error occurred while calling {0}{1}{2}.\n".
--> 328 format(target_id, ".", name), value) 
    329 else: 
    330 raise 
Py4JError( Py4JJavaError: An error occurred while calling o877.saveAsTable. : 
  org.apache.spark.SparkException: Job aborted. at     
  org.apache.spark.sql.execution.datasources.FileFormatWriter$.
  write(FileFormatWriter.scala:224)

...

File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera4-1.cdh5.13.3.p0.611179/
  lib/spark2/python/pyspark/serializers.py", line 566, 
  in loads return pickle.loads(obj, encoding=encoding)
ModuleNotFoundError: No module named 'src'
 

Thanks for checking it out.
Cheers.

Upvotes: 0

Views: 1947

Answers (1)

Diego-MX
Diego-MX

Reputation: 2339

I found out the problem behind this dataframe.
This was not something about the writer, but on the intermediate table calculations.

As @kfkhalili pointed out, it's a good recommendation to do sporadic .show()s in order to verify it's running smoothly.

Thanks.

Upvotes: 1

Related Questions