Databricks Runtime 10.4 LTS - AnalysisException: No such struct field id in 0, 1 after upgrading

Question

We are working to migrate to data bricks runtime 10.4 LTS from 9.1 LTS but we're running into weird behavioral issues. Our existing code works up until runtime 10.3 and in 10.4 it stopped working.

Problem: We have a nested Json file that we are flattening into a spark data frame using the code below:

adaccountsdf = df.withColumn('Exp_Organizations', 
                             F.explode(F.col('organizations.organization')))\
                 .withColumn('Exp_AdAccounts', 
                             F.explode(F.col('Exp_Organizations.ad_accounts')))\
                 .select(F.col('Exp_Organizations.id').alias('organizationId'),
                         F.col('Exp_Organizations.name').alias('organizationName'),
                         F.col('Exp_AdAccounts.id').alias('adAccountId'),
                         F.col('Exp_AdAccounts.name').alias('adAccountName'),
                         F.col('Exp_AdAccounts.timezone').alias('timezone'))

Now when we query the dataframe it works when we do the following selects (hid results due to confidentiality):

display(adaccountsdf.select("*"))

Result of above statement here:

When I display the schema of the dataframe we get the following:

root
|-- organizationId: string (nullable = true)
|-- organizationName: string (nullable = true)
|-- adAccountId: string (nullable = true)
|-- adAccountName: string (nullable = true)
|-- timezone: string (nullable = true)

so everything looks like it should. The moment we start selecting the last 3 fields(adAccountId, adAccountName and timezone):

display(adaccountsdf.select("adAccountId","adAccountName"))

We get the error AnalysisException: No such struct field id in 0, 1.

Image of the result of above statement:

However when I run the statement display(adaccountsdf.select("adAccountId")) it works just fine.

Does anyone know why this is happening? It's a very strange error that only shows up in databricks runtime 10.4. All previous runtimes incl 10.3, 10.2,10.1 and 9.1 LTS work fine. The issue seems to be caused by using the explode function on an already exploded column in the data frame.

UPDATE:

For some reason when I run adaccountsdf.cache() before I run my select statements the issue disappears. I would still like to know what's causing this issue in runtime 10.4 but not the other ones.

Databricks Runtime 10.4 LTS - AnalysisException: No such struct field id in 0, 1 after upgrading

Answers (1)

Related Questions