Reputation: 13091
I have DF1 with schema:
df1 = spark.read.parquet(load_path1)
df1.printSchema()
root
|-- PRODUCT_OFFERING_ID: string (nullable = true)
|-- CREATED_BY: string (nullable = true)
|-- CREATION_DATE: string (nullable = true)
and DF2:
df2 = spark.read.parquet(load_path2)
df2.printSchema()
root
|-- PRODUCT_OFFERING_ID: decimal(38,10) (nullable = true)
|-- CREATED_BY: decimal(38,10) (nullable = true)
|-- CREATION_DATE: timestamp (nullable = true)
Now I want to Union these 2 dataframes..
Sometime it gives errors when I try to UNION these 2 DFs because of different schemas..
How to set for DF2 to have exact same schema (during the load time) as DF1?
I tried with:
df2 = spark.read.parquet(load_path2).schema(df1.schema)
Getting error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'StructType' object is not callable
Or should I CAST it instead (once DF2 is read)?
Thanks.
Upvotes: 3
Views: 10522
Reputation: 31470
Move .schema()
before .parquet()
then spark will read the parquet file with the specified schema
df2 = spark.read.schema(df1.schema).parquet(load_path2)
Upvotes: 10