Reputation: 1891
To make it clear, I'm asking about the syntax of Spark SQL instead of Spark DataFrame.
We know that Spark SQL can query a parquet (text, etc.) table directly and below is an example. What if there are 2 paths that I want to use as a single table?
select
*
from
parquet.`path_1`
Upvotes: 0
Views: 1598
Reputation: 3110
This is in python, so i can show the variables, but the SQL portion will be the same. I'm assuming that you'd just want the data to append onto itself? if so:
pth1 = '/path/to/location1/part-r-00000-bf53578.gz.parquet'
pth2 = '/path/to/location2/part-r-00001-bf265.gz.parquet'
sqlContext.sql("""
select * from parquet.`hdfs://{0}`
union
select * from parquet.`hdfs://{1}`
""".format(pth1,pth2)).show()
+----+----+------------+
|col1|col2| col3|
+----+----+------------+
| 2| b|9.0987654321|
| 1| a| 4.123456789|
+----+----+------------+
in ONLY spark-sql it'd look like:
SELECT *
FROM parquet.`hdfs:///path/to/location1/part-r-00000-bf53578.gz.parquet`
UNION
SELECT *
FROM parquet.`hdfs:///path/to/location2/part-r-00001-bf265.gz.parquet`
Upvotes: 1