peter_m
peter_m

Reputation: 1

Pyspark in Impact - Parquet Failure: py4j.protocol.Py4JError: An error occurred while calling o1271.parquet

Hier ist die übersetzte und verständlichere Version Ihrer Frage für Stack Overflow:

Title: Issue with Identifying Home Location and Validating Potential Era in PySpark

I have created a code with four methods that perform different calculations. Each row represents a trip of an airplane, and I want to determine the home airport of an airplane based on various calculations (e.g., where the airplane was most frequently located, etc.). Additionally, I want to identify different usage periods. The methods I have, in this sequence, are:

initial_check identify_home_location generate_tour_ids (=> here, multiple trips can be grouped into a tour) validate_potential_era

To identify multiple eras, these methods are called twice. During the first run of the methods, the execution works perfectly, and we get the expected table. However, during the next run of the methods, sometimes the following error is thrown in identify_home_location or validate_potential_era:

raceback (most recent call last): File "", line 1, in File "/scratch/asset-install/f3bbfa2d5e9e3fb2c48ab2c93eda92f0/miniconda38/lib/python3.8/site-packages/foundry_pyls/spark.py", line 80, in dump_dataframe_as_parquet df.limit(MAX_OUTPUT_ROWS).write.parquet( File "/scratch/asset-install/f3bbfa2d5e9e3fb2c48ab2c93eda92f0/miniconda38/lib/python3.8/site-packages/pyspark/sql/readwriter.py", line 1721, in parquet self.jwrite.parquet(path) File "/scratch/asset-install/f3bbfa2d5e9e3fb2c48ab2c93eda92f0/miniconda38/lib/python3.8/site-packages/py4j/java_gateway.py", line 1322, in _call return_value = get_return_value( File "/scratch/asset-install/f3bbfa2d5e9e3fb2c48ab2c93eda92f0/miniconda38/lib/python3.8/site-packages/pyspark/errors/exceptions/captured.py", line 179, in deco return f(*a, **kw) File "/scratch/asset-install/f3bbfa2d5e9e3fb2c48ab2c93eda92f0/miniconda38/lib/python3.8/site-packages/py4j/protocol.py", line 334, in get_return_value raise Py4JError( py4j.protocol.Py4JError: An error occurred while calling o1271.parquet

We are working with PySpark on Impact. Has anyone encountered the same error and can help me resolve this?

I have simplified the validate_era function so that only dummy values are inserted in the specific columns. This helped8 and I could execute each function twice without problems. So what I don't understand is that the code in the validate_era method should be fine as it can be executed in the first run without throwing an error.

Upvotes: 0

Views: 26

Answers (0)

Related Questions