pinegulf
pinegulf

Reputation: 1396

Join on 1 = 1 on pyspark

Good day,

I wish to check if my variable exists in either of two tables and get the result in single table for furture processing. I figured that simple:

'''
   select concent, concent_big  from 
     (select count(*) as concent where core_id = "{}" ) as a
   left join 
     (select count(*) as concent_big ,concent_2 where core_id = "{}" ) as b
   on 1 = 1
'''

However this does not seem be allowed. It's a bit confusing, since I've done similar things in Sql previously. Now pySpark is giving me hard time. I came up with work-around, but it's (imho) silly:

 '''
    select concent, concent_big  from 
      (select count(*) as concent, 1 as tmp_key from concent where core_id = "{}" ) as a
    left join 
      (select count(*) as concent_big , 1 as tmp_key from concent_2 where core_id = "{}" ) as b
    on a.tmp_key = b.tmp_key
 '''

Any ideas how to do this more elegantly?

Upvotes: 0

Views: 2049

Answers (1)

Napoleon Borntoparty
Napoleon Borntoparty

Reputation: 1962

Why not just use crossJoin? A word of warning - this results in a full cartesian product, so your table might explode in size, but that appears to be the desired effect in your case. You can read up on it here: https://spark.apache.org/docs/2.4.3/api/python/pyspark.sql.html#pyspark.sql.DataFrame.crossJoin

EDIT: When using Spark SQL, the language follows ANSI SQL standards, therefore the command becomes CROSS JOIN.

Hope this helps.

Upvotes: 1

Related Questions