Why is the count different for cross join in Pyspark w or w/o join condition?

Question

dfj3 = spark.createDataFrame(
    ['a','b','b'],StringType()
    )

dfj4 = spark.createDataFrame(
    ['c','d','e'],StringType()
)

dfj3.join(dfj4).count() // #crossjoin, count = 9
dfj3.join(dfj4,dfj3.value==dfj4.value).count() #innerjoin, count = 0
dfj3.join(dfj4,dfj3.value==dfj4.value,'cross').count() #crossjoin with condition, count = 0

Why is 1st and 3rd cross join working differently?

Expected Cross join with join condition and cross join without join condition should be the same as the joins will be performed for all the records in both tables.

Why is the count different for cross join in Pyspark w or w/o join condition?

Answers (1)

Related Questions