Reputation: 468
I've got 2 parquets files.
The first one contains the following column: DECIMAL: decimal(38,18) (nullable = true)
The second one has the same column, but with a different type: DECIMAL: integer (nullable = true)
I want to merge them, but I can't simply read them separatedly and throw a cast into the specific column, because this is part of an app that receives lots of distinct parquet schemas. I need something that would cover every scenario.
I am reading both like this:
df = spark.read.format("parquet").load(['path_to_file_one', 'path_to_file_2'])
It fails with the error below when I try to display the data
Parquet column cannot be converted. Column: [DECIMAL], Expected: DecimalType(38,18), Found: INT32
I am using Azure Databricks with the following configs:
I have uploaded the parquet files here: https://easyupload.io/m/su37e8
Is there anyway I can force spark to autocast null columns into the type of the same column in the other dataframe?
It should be easy, all the columns are nullable...
Upvotes: 3
Views: 2047
Reputation: 12768
This is expected if you are providing external schema with column datatype definition as a decimal and that column contains decimal(38,18).
We found that it's a limitation with the spark. Columns with datatype decimal(38,18).
Try df.show()
to display the results.
Upvotes: 1