skywalkerytx
skywalkerytx

Reputation: 108

Spark SQL bug:DataFrame.withcolumn() could duplicate column already exists

When creating DataFrame.withcolumn(),Spark dev team forgot to check it that column name is already in use.

In the beginning:

val res = sqlContext.sql("select * from tag.tablename where dt>20150501 limit 1").withColumnRenamed("tablename","tablename")
res.columns

shows:

res6: Array[String] = Array(user_id, service_type_id, tablename, dt)

then

val res1 = res.withColumn("tablename",res("tablename")+1)
res1.columns

shows:

res7: Array[String] = Array(user_id, service_type_id, tablename, dt, tablename)

By the way, res1.show works.

BUG begins here:

res1.select("tablename")

org.apache.spark.sql.AnalysisException: Ambiguous references to tablename: (tablename#48,List()),(tablename#53,List());

Upvotes: 0

Views: 2320

Answers (1)

Spiro Michaylov
Spiro Michaylov

Reputation: 3571

This has already been reported as SPARK-6635. It's already been fixed, and seems set to be released in Spark 1.4.0.

Upvotes: 2

Related Questions