Reputation: 563

Pyspark: get datatype of Nested struct column

I am currently working with some rather complex json files that I am supposed to transform and write into a delta table. Problem is, that every file has slight differences when it comes to the datatype of the column. Can someone explain me the general approach for retrieving the datatype of a nested struct column? In the internet, I can only find how to do a select on them: https://sparkbyexamples.com/pyspark/pyspark-select-nested-struct-columns/

In case I have a format like this:

How would I manage to get the datatype of, lets say, lastname?

Edit: the Json file is of course already written in a dataframe, my question is about how to query the dataframe in order to retrieve the datatype

Thanks a lot!

Upvotes: 0

Answers (2)

Ignas Kiela

Reputation: 180

This way avoids doing an actual query.

df.schema["name"].dataType["lastname"].dataType

Upvotes: 0

Thijs

Reputation: 296

you can access the dtype like this:

import pyspark.sql.functions as F
df.select(F.col("name.lastname")).dtypes[0][1]

Upvotes: 1

Pyspark: get datatype of Nested struct column

Answers (2)

Related Questions