Reputation: 4353
I have two pyspark dataframes:
| A | B | C |
| 21 | 999 | 1000|
| 22 | 786 | 1978|
| 23 | 345 | 1563|
and
| A | D | E |
| 21 | aaa | a12 |
| 22 | bbb | b43 |
| 23 | ccc | h67 |
Desired result:
| A | B | C | E |
| 21 | 999 | 1000| a12 |
| 22 | 786 | 1978| b43 |
| 23 | 345 | 1563| h67 |
I tried using join, even df1.join(df2.E, df1.A == df2.A)
to no avail.
Upvotes: 2
Views: 4602
Reputation: 3100
When you are trying to join the 2 dataframe using the function join
it takes 3 arguments.
PFB sample code.
df1.join(df2, df1.id == df2.id, 'outer')
You can find more details here.
Regards,
Neeraj
Upvotes: 3
Reputation: 1983
I think this code does what you want:
joinedDF = df1.join(df2.select('A', 'E'), ['A'])
Upvotes: 3