user17955942
user17955942

Reputation: 53

Is it possible to create a view from external data?

I have some csv files in my data lake which are being quite frequently updated through another process. Ideally I would like to be able to query these files through spark-sql, without having to run an equally frequent batch process to load all the new files into a spark table.

Looking at the documentation, I'm unsure as all the examples show views that query existing tables or other views, rather than loose files stored in a data lake.

Upvotes: 0

Views: 688

Answers (1)

Elmar Macek
Elmar Macek

Reputation: 380

You can do something like this if your csv is in S3 under the location s3://bucket/folder:

spark.sql(
"""
CREATE TABLE test2
(a string, b string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ","
LOCATION 's3://bucket/folder'

"""
)

You have to adapt the fields tho and the field separators. To test it, you can first run:

Seq(("1","a"), ("2","b"), ("3","a"), ("4","b")).toDF("num", "char").repartition(1).write.mode("overwrite").csv("s3://bucket/folder")

Upvotes: 1

Related Questions