crystyxn
crystyxn

Reputation: 1621

Rename or give alias to Python Spark dataframe column names

I am using PySpark 2.4.3 and I have a dataframe that I wish to write to Parquet, but the column names have spaces, such as Hour of day.

df = spark.read.csv("file.csv", header=True)
df.write.parquet('input-parquet/')

I am getting this error currently:

An error occurred while calling o425.parquet.
: org.apache.spark.sql.AnalysisException: Attribute name "Hour of day" contains invalid character(s) among " ,;{}()\n\t=". Please use alias to rename it.;

How can I either rename the columns or give them aliases to be able to write to Parquet?

Upvotes: 0

Views: 2343

Answers (1)

Bitswazsky
Bitswazsky

Reputation: 4698

You can rename the column with the withColumnRenamed(existing, new) method, and then write to parquet. It would be something like this:

df.withColumnRenamed('Hour of day', 'Hour')

Upvotes: 1

Related Questions