Reputation: 3919
This has a different answer to those given in the post above
I am getting an error that reads
pyspark.sql.utils.AnalysisException: u'Unable to infer schema for Parquet. It must be specified manually.;'
when I try to read in a parquet file like such using Spark 2.1.0
data = spark.read.parquet('/myhdfs/location/')
I have checked and the file/table is not empty by looking at the impala table through the Hue WebPortal. Also, other files that I have stored in similar directories read absolutely fine. For the record, the file names contain hyphens but no underscores or full-stops/periods.
Hence, none of the answers in the following post apply Unable to infer schema when loading Parquet file
Any ideas?
Upvotes: 5
Views: 21415
Reputation: 1
For me, it worked when I specified the properties manually like below.
data = spark.read.parquet("/myhdfs/location/anotherlevel/").select( "Property1", "Property2", "Property3" )
Upvotes: 0
Reputation: 1
I got the same problem but none of the answers I found online worked for me. It turns out that I was writing the code in this way:
data = spark.read.parquet("/myhdfs/location/anotherlevel/")
so, using double " . When I switched to using single ' , my problem was solved.
data = spark.read.parquet('/myhdfs/location/anotherlevel/')
Sharing in case it helps anybody
Upvotes: 0
Reputation: 3919
It turns out I was getting this error because there was another level to the directory structure. The following was what I needed;
data = spark.read.parquet('/myhdfs/location/anotherlevel/')
Upvotes: 7