Taylrl
Taylrl

Reputation: 3919

pyspark.sql.utils.AnalysisException: u'Unable to infer schema for Parquet. It must be specified manually.;'

This has a different answer to those given in the post above

I am getting an error that reads

pyspark.sql.utils.AnalysisException: u'Unable to infer schema for Parquet. It must be specified manually.;'

when I try to read in a parquet file like such using Spark 2.1.0

data = spark.read.parquet('/myhdfs/location/')

I have checked and the file/table is not empty by looking at the impala table through the Hue WebPortal. Also, other files that I have stored in similar directories read absolutely fine. For the record, the file names contain hyphens but no underscores or full-stops/periods.

Hence, none of the answers in the following post apply Unable to infer schema when loading Parquet file

Any ideas?

Upvotes: 5

Views: 21415

Answers (3)

Prince
Prince

Reputation: 1

For me, it worked when I specified the properties manually like below.

data = spark.read.parquet("/myhdfs/location/anotherlevel/").select( "Property1", "Property2", "Property3" )

Upvotes: 0

user18580758
user18580758

Reputation: 1

I got the same problem but none of the answers I found online worked for me. It turns out that I was writing the code in this way:

data = spark.read.parquet("/myhdfs/location/anotherlevel/")

so, using double " . When I switched to using single ' , my problem was solved.

data = spark.read.parquet('/myhdfs/location/anotherlevel/')

Sharing in case it helps anybody

Upvotes: 0

Taylrl
Taylrl

Reputation: 3919

It turns out I was getting this error because there was another level to the directory structure. The following was what I needed;

data = spark.read.parquet('/myhdfs/location/anotherlevel/')

Upvotes: 7

Related Questions