Clay
Clay

Reputation: 2736

Removing "." from Spark DataFrame column names

How do I remove the "." from a Spark DataFrame column name?

The DataFrame.select(F.col().alias()) method to rename column names that have a "." in them throws an error.

The following code is reproducible.

# import Spark libraries, configuration, Contexts, and types.
import pyspark
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
#############
# Start Spark.
spark = SparkSession.builder.appName("test").getOrCreate()

testdf = spark.createDataFrame([
    (1, "Julie", "CEO"),
    (2, "Janice", "CFO"),
    (3, "Jake", "CTO")],
    ["ID", "First Name", "Title Initial."])

# this works just fine.
testdf.select(F.col('First Name').alias('first_name')).show(3)

# This throws an error. 
testdf.select(F.col('Title Initial.').alias('title')).show(3)

Error:

AnalysisException: u'syntax error in attribute name: Title Initial.;'

What is an alternative method to change DataFrame column names that have a "." in them?

Upvotes: 4

Views: 7784

Answers (2)

Ryan Xu
Ryan Xu

Reputation: 21

Today I met the same problem in PySpark 3.3.1 with Python 3.9.16, add '`' at the beginning and the end of the column can works.

...
col_name = 'No.'
df = df.withColumn(col_name, df[col_name].cast('bigint'))

will raise a error: AnalysisException: syntax error in attribute name: No.

After add '`', it works:

df = df.withColumn(col_name, df["`"+col_name+"`"].cast('bigint'))

Upvotes: 2

manojlds
manojlds

Reputation: 301347

Surround the column name with `

testdf.select(F.col('`Title Initial.`').alias('title')).show(3)

Upvotes: 3

Related Questions