bda
bda

Reputation: 422

Spark persistent view on a partition parquet file

In Spark, is it possible to create a persistent view on a partitioned parquet file in Azure BLOB? The view must be available when the cluster restarted, without having to re-create that view, hence it cannot be a temp view.

I can create a temp view, but not the persistent view. Following code returns an exception.

spark.sql("CREATE VIEW test USING parquet OPTIONS (path \"/mnt/folder/file.c000.snappy.parquet\")")

ParseException: mismatched input 'USING' expecting {'(', 'UP_TO_DATE', 'AS', 'COMMENT', 'PARTITIONED', 'TBLPROPERTIES'}(line 1, pos 23)

Big thank you for taking a look :)

Upvotes: 0

Views: 1509

Answers (1)

Saideep Arikontham
Saideep Arikontham

Reputation: 6104

The syntax you used is working for temporary views but not persistent views. I faced the same error ParseException: mismatched input 'USING' expecting {'(', 'UP_TO_DATE', 'AS', 'COMMENT', 'PARTITIONED', 'TBLPROPERTIES'}, when I tried to reproduce this using a similar syntax.

  • Using the syntax provided for the CREATE VIEW in this official Microsoft documentation, I created a persistent view using data source (using my mount point and a csv file in it) in the following way:
spark.sql("CREATE VIEW demo2 as select * from csv.`/mnt/repro/op.csv`")

Output Image:

enter image description here

  • So, you can modify your query to the following statement:
CREATE VIEW test as select * from parquet.`/mnt/folder/file.c000.snappy.parquet\`

Upvotes: 2

Related Questions