Reputation: 13686
I have a huge Spark DataFrame which I create using the following statement
val df = sqlContext.read.option("mergeSchema", "true").parquet("parquet/partitions/path")
Now when I try to do column rename or select operation on above DataFrame it fails saying ambiguous columns found with the following exception
org.apache.spark.sql.AnalysisException: Reference 'Product_Type' is ambiguous, could be Product_Type#13, Product_Type#235
Now I saw columns and found there are two columns Product_Type
and Product_type
which seems to be same columns with one letter case different created because of schema merge over time. Now I don't mind keeping duplicate columns but Spark sqlContext for some reason don't like it.
I believe by default spark.sql.caseSensitive
config is true so don't know why it fails. I am using Spark 1.5.2. I am new to Spark.
Upvotes: 1
Views: 5096
Reputation: 41987
By default, spark.sql.caseSensitive
property is false
so before your rename
or select
statement, you should set the property to true
sqlContext.sql("set spark.sql.caseSensitive=true")
Upvotes: 5