user15851574
user15851574

Reputation:

are nested queries supported in spark sql?

I made a table like

spark.sql("""create table temp stored as parquet TBLPROPERTIES('parquet.compression'='snappy') AS...""")

and I can query the table. however when I use this table in a subquery like

spark.sql("""
SELECT DISTINCT 
...
AND col in (select * from temp)
...
""")

I get this error

org.apache.spark.sql.AnalysisException: Table or view not found: `temp`

is this a limitation of spark? original table and column names are not used for confidentiality

Upvotes: 0

Views: 1954

Answers (1)

Chitral Verma
Chitral Verma

Reputation: 2853

Yes, nested queries are supported by SparkSQL.

Since you didn't give a clear example, see my example below. Consider the data set below created by the create statement, table name temp.

+---+-------------+------+
| id|         name|gender|
+---+-------------+------+
|  1|   John Smith|  Male|
|  2|   John Smith|  null|
|  3|Rebecca Davis|Female|
|  4|   Paul Adams|  Male|
|  5|         null|  null|
+---+-------------+------+

to get records whose name appears exactly once you can write something like below,

spark.sql("""SELECT T.* FROM temp AS T 
              RIGHT JOIN 
              (SELECT name, COUNT(name) AS count_name FROM temp GROUP BY name HAVING count_name = 1) AS sq 
              ON sq.name=T.name""")

This will give you,

+---+-------------+------+
| id|         name|gender|
+---+-------------+------+
|  3|Rebecca Davis|Female|
|  4|   Paul Adams|  Male|
+---+-------------+------+

For your example, I will suggest checking the queries again.

Upvotes: 2

Related Questions