lampShadesDrifter
lampShadesDrifter

Reputation: 4139

Case sensitive column drop operation for pyspark dataframe?

From some brief testing, it appears that the column drop function for pyspark dataframes is not case sensitive, eg.

from pyspark.sql import SparkSession
from pyspark.sql.functions import *
import sys

sparkSession = SparkSession.builder.appName("my-session").getOrCreate()

dff = sparkSession.createDataFrame([(10,123), (14,456), (16,678)], ["age", "AGE"])

>>> dff.show()
+---+---+
|age|AGE|
+---+---+
| 10|123|
| 14|456|
| 16|678|
+---+---+

>>> dff.drop("AGE")
DataFrame[]

>>> dff_dropped = dff.drop("AGE")
>>> dff_dropped.show()
++
||
++
||
||
||
++

"""
What I'd like to see here is:
+---+
|age|
+---+
| 10|
| 14|
| 16|
+---+
"""

Is there a way to drop dataframe columns in a case sensitive way? (Have seen some comments related to something like this in spark JIRA discussions, but was looking for something at only applied to the drop() operation in an ad hoc way (not a global / persistent setting)).

Upvotes: 3

Views: 2897

Answers (1)

Prathik Kini
Prathik Kini

Reputation: 1708

#Add this before using drop
sqlContext.sql("set spark.sql.caseSensitive=true")

You need to set casesensitivity as true if you have two columns having same name

Upvotes: 6

Related Questions