Reputation: 8951
I'm getting the following error attempting to flatten a highly nested structure:
org.apache.spark.sql.AnalysisException: Ambiguous reference to fields StructField(error,StructType(StructField(array,ArrayType(StructType(StructField(double,DoubleType,true), StructField(int,IntegerType,true), StructField(string,StringType,true)),true),true), StructField(double,DoubleType,true), StructField(int,IntegerType,true), StructField(string,StringType,true), StructField(struct,StructType(StructField(message,StringType,true), StructField(kind,StringType,true), StructField(stack,StringType,true)),true)),true), StructField(Error,StructType(StructField(array,ArrayType(StringType,true),true), StructField(string,StringType,true)),true)
I can't seem to figure out what in particular is causing this. What is the ambiguity, other than a deeply nested Struct?
Upvotes: 0
Views: 16937
Reputation: 199
This happens when you are doing a join between 2 dataframes, and both dataframes have a field with same name. When you call for the duplicated field, Spark doesn't know which column are you requesting. Solution: rename the field in one the sides of the join, and it is done. Example
You are joining both dataframes by column "id" and you want to select the "name" column in the second one:
val dfJoined = dfA.join(dfB,Seq("id"),"inner").select("name")
As column "name" is existing in both dataframes, Spark cannot identify which "name" are you asking for.
Solution:
val dfRenamedB = dfB.withColumnRenamed("name","b_name")
Now, when you are joining both dataframes, you would get columns "name" and "b_name", and you could identify which one is the selected one.
Upvotes: 2