Flavio Pegas
Flavio Pegas

Reputation: 468

Spark fails to merge parquet files (INTEGER -> DECIMAL)

I've got 2 parquets files.

The first one contains the following column: DECIMAL: decimal(38,18) (nullable = true)

The second one has the same column, but with a different type: DECIMAL: integer (nullable = true)

I want to merge them, but I can't simply read them separatedly and throw a cast into the specific column, because this is part of an app that receives lots of distinct parquet schemas. I need something that would cover every scenario.

I am reading both like this:

df = spark.read.format("parquet").load(['path_to_file_one', 'path_to_file_2'])

It fails with the error below when I try to display the data

Parquet column cannot be converted. Column: [DECIMAL], Expected: DecimalType(38,18), Found: INT32

I am using Azure Databricks with the following configs:

I have uploaded the parquet files here: https://easyupload.io/m/su37e8

Is there anyway I can force spark to autocast null columns into the type of the same column in the other dataframe?

It should be easy, all the columns are nullable...

Upvotes: 3

Views: 2047

Answers (1)

CHEEKATLAPRADEEP
CHEEKATLAPRADEEP

Reputation: 12768

This is expected if you are providing external schema with column datatype definition as a decimal and that column contains decimal(38,18).

enter image description here

We found that it's a limitation with the spark. Columns with datatype decimal(38,18).

Try df.show() to display the results.

enter image description here

Upvotes: 1

Related Questions