Spark fails to merge parquet files (INTEGER - DECIMAL)

Question

I've got 2 parquets files.

The first one contains the following column: DECIMAL: decimal(38,18) (nullable = true)

The second one has the same column, but with a different type: DECIMAL: integer (nullable = true)

I want to merge them, but I can't simply read them separatedly and throw a cast into the specific column, because this is part of an app that receives lots of distinct parquet schemas. I need something that would cover every scenario.

I am reading both like this:

df = spark.read.format("parquet").load(['path_to_file_one', 'path_to_file_2'])

It fails with the error below when I try to display the data

Parquet column cannot be converted. Column: [DECIMAL], Expected: DecimalType(38,18), Found: INT32

I am using Azure Databricks with the following configs:

DBR: 7.1
Spark 3.0.0

I have uploaded the parquet files here: https://easyupload.io/m/su37e8

Is there anyway I can force spark to autocast null columns into the type of the same column in the other dataframe?

It should be easy, all the columns are nullable...

Spark fails to merge parquet files (INTEGER -> DECIMAL)

Answers (1)

Related Questions

Spark fails to merge parquet files (INTEGER -&gt; DECIMAL)

Answers (1)

Related Questions

Spark fails to merge parquet files (INTEGER -> DECIMAL)