Reputation: 7235
I have two pyspark
dataframes df1
and df2
df1
id1 id2 id3 x y
0 1 2 0.5 0.4
2 1 0 0.3 0.2
3 0 2 0.8 0.9
2 1 3 0.2 0.1
df2
id name
0 A
1 B
2 C
3 D
I would like to join the two dataframes and have
df3
id1 id2 id3 n1 n2 n3 x y
0 1 2 A B C 0.5 0.4
2 1 0 C B A 0.3 0.2
3 0 2 D A C 0.8 0.9
2 1 3 C B D 0.2 0.1
Upvotes: 0
Views: 83
Reputation: 13551
Here is the multiple joins.
df1.join(df2, df1['id1'] == df2['id'], 'left').drop('id').withColumnRenamed('name', 'n1') \
.join(df2, df1['id2'] == df2['id'], 'left').drop('id').withColumnRenamed('name', 'n2') \
.join(df2, df1['id3'] == df2['id'], 'left').drop('id').withColumnRenamed('name', 'n3') \
.show()
+---+---+---+---+---+---+---+---+
|id1|id2|id3| x| y| n1| n2| n3|
+---+---+---+---+---+---+---+---+
| 0| 1| 2|0.5|0.4| A| B| C|
| 2| 1| 0|0.3|0.2| C| B| A|
| 3| 0| 2|0.8|0.9| D| A| C|
| 2| 1| 3|0.2|0.1| C| B| D|
+---+---+---+---+---+---+---+---+
Upvotes: 1