Benjamin Du
Benjamin Du

Reputation: 1891

How to specify multiple paths to an table in Spark SQL

To make it clear, I'm asking about the syntax of Spark SQL instead of Spark DataFrame.

We know that Spark SQL can query a parquet (text, etc.) table directly and below is an example. What if there are 2 paths that I want to use as a single table?

select
    *
from
    parquet.`path_1`

Upvotes: 0

Views: 1598

Answers (1)

James Tobin
James Tobin

Reputation: 3110

This is in python, so i can show the variables, but the SQL portion will be the same. I'm assuming that you'd just want the data to append onto itself? if so:

pth1 = '/path/to/location1/part-r-00000-bf53578.gz.parquet'
pth2 = '/path/to/location2/part-r-00001-bf265.gz.parquet'
sqlContext.sql("""
    select * from parquet.`hdfs://{0}` 
    union 
    select * from  parquet.`hdfs://{1}`
    """.format(pth1,pth2)).show()
+----+----+------------+
|col1|col2|        col3|
+----+----+------------+
|   2|   b|9.0987654321|
|   1|   a| 4.123456789|
+----+----+------------+

in ONLY spark-sql it'd look like:

SELECT * 
FROM parquet.`hdfs:///path/to/location1/part-r-00000-bf53578.gz.parquet`
UNION 
SELECT * 
FROM parquet.`hdfs:///path/to/location2/part-r-00001-bf265.gz.parquet`

Upvotes: 1

Related Questions