Reputation: 1396
Good day,
I wish to check if my variable exists in either of two tables and get the result in single table for furture processing. I figured that simple:
'''
select concent, concent_big from
(select count(*) as concent where core_id = "{}" ) as a
left join
(select count(*) as concent_big ,concent_2 where core_id = "{}" ) as b
on 1 = 1
'''
However this does not seem be allowed. It's a bit confusing, since I've done similar things in Sql previously. Now pySpark is giving me hard time. I came up with work-around, but it's (imho) silly:
'''
select concent, concent_big from
(select count(*) as concent, 1 as tmp_key from concent where core_id = "{}" ) as a
left join
(select count(*) as concent_big , 1 as tmp_key from concent_2 where core_id = "{}" ) as b
on a.tmp_key = b.tmp_key
'''
Any ideas how to do this more elegantly?
Upvotes: 0
Views: 2049
Reputation: 1962
Why not just use crossJoin
? A word of warning - this results in a full cartesian product, so your table might explode in size, but that appears to be the desired effect in your case. You can read up on it here: https://spark.apache.org/docs/2.4.3/api/python/pyspark.sql.html#pyspark.sql.DataFrame.crossJoin
EDIT: When using Spark SQL, the language follows ANSI SQL standards, therefore the command becomes CROSS JOIN
.
Hope this helps.
Upvotes: 1