Reputation:
I made a table like
spark.sql("""create table temp stored as parquet TBLPROPERTIES('parquet.compression'='snappy') AS...""")
and I can query the table. however when I use this table in a subquery like
spark.sql("""
SELECT DISTINCT
...
AND col in (select * from temp)
...
""")
I get this error
org.apache.spark.sql.AnalysisException: Table or view not found: `temp`
is this a limitation of spark? original table and column names are not used for confidentiality
Upvotes: 0
Views: 1954
Reputation: 2853
Yes, nested queries are supported by SparkSQL.
Since you didn't give a clear example, see my example below.
Consider the data set below created by the create statement, table name temp
.
+---+-------------+------+
| id| name|gender|
+---+-------------+------+
| 1| John Smith| Male|
| 2| John Smith| null|
| 3|Rebecca Davis|Female|
| 4| Paul Adams| Male|
| 5| null| null|
+---+-------------+------+
to get records whose name appears exactly once you can write something like below,
spark.sql("""SELECT T.* FROM temp AS T
RIGHT JOIN
(SELECT name, COUNT(name) AS count_name FROM temp GROUP BY name HAVING count_name = 1) AS sq
ON sq.name=T.name""")
This will give you,
+---+-------------+------+
| id| name|gender|
+---+-------------+------+
| 3|Rebecca Davis|Female|
| 4| Paul Adams| Male|
+---+-------------+------+
For your example, I will suggest checking the queries again.
Upvotes: 2