Snowfire777
Snowfire777

Reputation: 203

Pyspark : error while reading paquet file

I am trying to read a parquet file with pyspark using the command :

file = spark.read.parquet("/FileStore/tables/file_name.parquet")

The columns in the parquet file have spaces. So I tried to rename the columns using :

for c in file.columns:
    file = file.withColumnRenamed(c, c.replace(" ", ""))

When I look into the column names and the schema, my columns don't have spaces. However, when I try to display the Dataframe I get the error :

AnalysisException: Attribute name "Col Name" contains invalid character(s) among " ,;{}()\n\t=". Please use alias to rename it.;

Any idea how to solve this issue ?

Upvotes: 0

Views: 519

Answers (1)

Pyspark Developer
Pyspark Developer

Reputation: 69

How did the parquet file got created . if possible check the mapping of column has space in it or not .

if its not possible try by telling data frame reader api your own schema

some thing like this schema = "col1 string , col2 int"

df = spark.read.format("parquet")
.option("path",ur_path)
.schema(schema)
.load() print(df.schema.simplestring())

please check if it helps or not

Upvotes: 1

Related Questions