Spark persistent view on a partition parquet file

Question

In Spark, is it possible to create a persistent view on a partitioned parquet file in Azure BLOB? The view must be available when the cluster restarted, without having to re-create that view, hence it cannot be a temp view.

I can create a temp view, but not the persistent view. Following code returns an exception.

spark.sql("CREATE VIEW test USING parquet OPTIONS (path \"/mnt/folder/file.c000.snappy.parquet\")")

ParseException: mismatched input 'USING' expecting {'(', 'UP_TO_DATE', 'AS', 'COMMENT', 'PARTITIONED', 'TBLPROPERTIES'}(line 1, pos 23)

Big thank you for taking a look :)

Saideep Arikontham · Accepted Answer

The syntax you used is working for temporary views but not persistent views. I faced the same error ParseException: mismatched input 'USING' expecting {'(', 'UP_TO_DATE', 'AS', 'COMMENT', 'PARTITIONED', 'TBLPROPERTIES'}, when I tried to reproduce this using a similar syntax.

Using the syntax provided for the CREATE VIEW in this official Microsoft documentation, I created a persistent view using data source (using my mount point and a csv file in it) in the following way:

spark.sql("CREATE VIEW demo2 as select * from csv.`/mnt/repro/op.csv`")

Output Image:

So, you can modify your query to the following statement:

CREATE VIEW test as select * from parquet.`/mnt/folder/file.c000.snappy.parquet\`

Spark persistent view on a partition parquet file

Answers (1)

Related Questions