BSP
BSP

Reputation: 775

Pyspark N auto join

I have the following dataframes:

df1:
src | dst
 A  |  B
 A  |  C

df2:
src | dst
 B  |  D
 B  |  C
 C  |  D

df3:
src | dst
 D  |  A
 C  |  D

I would like to join the three (or N) dataframes to get:

output:
src | dst
 A  |  B
 A  |  C
 B  |  D
 C  |  D
 D  |  A

I have tried several join options (left semi mainly) but I have not succeded.

Upvotes: 0

Views: 112

Answers (1)

Somy
Somy

Reputation: 1624

I think you might need to do “union all“ of the data frames and then do a distinct

    val df4 = df1.union(df2).distinct

    val df5 = df3.union(df4).distinct

df5 would be your final data frame.

Let me know if this works.

Upvotes: 1

Related Questions