Reputation: 1621
I am using PySpark 2.4.3 and I have a dataframe that I wish to write to Parquet, but the column names have spaces, such as Hour of day
.
df = spark.read.csv("file.csv", header=True)
df.write.parquet('input-parquet/')
I am getting this error currently:
An error occurred while calling o425.parquet.
: org.apache.spark.sql.AnalysisException: Attribute name "Hour of day" contains invalid character(s) among " ,;{}()\n\t=". Please use alias to rename it.;
How can I either rename the columns or give them aliases to be able to write to Parquet?
Upvotes: 0
Views: 2343
Reputation: 4698
You can rename the column with the withColumnRenamed(existing, new)
method, and then write to parquet. It would be something like this:
df.withColumnRenamed('Hour of day', 'Hour')
Upvotes: 1