Reputation: 563
I am currently working with some rather complex json files that I am supposed to transform and write into a delta table. Problem is, that every file has slight differences when it comes to the datatype of the column. Can someone explain me the general approach for retrieving the datatype of a nested struct column? In the internet, I can only find how to do a select on them: https://sparkbyexamples.com/pyspark/pyspark-select-nested-struct-columns/
In case I have a format like this:
How would I manage to get the datatype of, lets say, lastname?
Edit: the Json file is of course already written in a dataframe, my question is about how to query the dataframe in order to retrieve the datatype
Thanks a lot!
Upvotes: 0
Views: 1140
Reputation: 180
This way avoids doing an actual query.
df.schema["name"].dataType["lastname"].dataType
Upvotes: 0
Reputation: 296
you can access the dtype like this:
import pyspark.sql.functions as F
df.select(F.col("name.lastname")).dtypes[0][1]
Upvotes: 1