Case sensitive column drop operation for pyspark dataframe?

Question

From some brief testing, it appears that the column drop function for pyspark dataframes is not case sensitive, eg.

from pyspark.sql import SparkSession
from pyspark.sql.functions import *
import sys

sparkSession = SparkSession.builder.appName("my-session").getOrCreate()

dff = sparkSession.createDataFrame([(10,123), (14,456), (16,678)], ["age", "AGE"])

>>> dff.show()
+---+---+
|age|AGE|
+---+---+
| 10|123|
| 14|456|
| 16|678|
+---+---+

>>> dff.drop("AGE")
DataFrame[]

>>> dff_dropped = dff.drop("AGE")
>>> dff_dropped.show()
++
||
++
||
||
||
++

"""
What I'd like to see here is:
+---+
|age|
+---+
| 10|
| 14|
| 16|
+---+
"""

Is there a way to drop dataframe columns in a case sensitive way? (Have seen some comments related to something like this in spark JIRA discussions, but was looking for something at only applied to the drop() operation in an ad hoc way (not a global / persistent setting)).

Prathik Kini · Accepted Answer

#Add this before using drop
sqlContext.sql("set spark.sql.caseSensitive=true")

You need to set casesensitivity as true if you have two columns having same name

Case sensitive column drop operation for pyspark dataframe?

Answers (1)

Related Questions