Reputation: 203
I am trying to read a parquet file with pyspark using the command :
file = spark.read.parquet("/FileStore/tables/file_name.parquet")
The columns in the parquet file have spaces. So I tried to rename the columns using :
for c in file.columns:
file = file.withColumnRenamed(c, c.replace(" ", ""))
When I look into the column names and the schema, my columns don't have spaces. However, when I try to display the Dataframe I get the error :
AnalysisException: Attribute name "Col Name" contains invalid character(s) among " ,;{}()\n\t=". Please use alias to rename it.;
Any idea how to solve this issue ?
Upvotes: 0
Views: 519
Reputation: 69
How did the parquet file got created . if possible check the mapping of column has space in it or not .
if its not possible try by telling data frame reader api your own schema
some thing like this schema = "col1 string , col2 int"
df = spark.read.format("parquet")
.option("path",ur_path)
.schema(schema)
.load()
print(df.schema.simplestring())
please check if it helps or not
Upvotes: 1