Qubix
Qubix

Reputation: 4353

Join a dataframe with a column from another, based on a common column

I have two pyspark dataframes:

|  A  |  B  |  C  |
| 21  | 999 | 1000|
| 22  | 786 | 1978|
| 23  | 345 | 1563|

and

|  A  |  D  |  E  |
| 21  | aaa | a12 |
| 22  | bbb | b43 |
| 23  | ccc | h67 |

Desired result:

|  A  |  B  |  C  |  E  |
| 21  | 999 | 1000| a12 |
| 22  | 786 | 1978| b43 |
| 23  | 345 | 1563| h67 |

I tried using join, even df1.join(df2.E, df1.A == df2.A) to no avail.

Upvotes: 2

Views: 4602

Answers (2)

Neeraj Bhadani
Neeraj Bhadani

Reputation: 3100

When you are trying to join the 2 dataframe using the function join it takes 3 arguments.

  1. arg-1 : another dataframe which you need to join.
  2. arg-2 : columns based on which you need to join the dataframes.
  3. arg-3 : Type of join you want to perform. by default its inner join.

PFB sample code.

df1.join(df2, df1.id == df2.id, 'outer')

You can find more details here.

Regards,

Neeraj

Upvotes: 3

Ali AzG
Ali AzG

Reputation: 1983

I think this code does what you want:

joinedDF = df1.join(df2.select('A', 'E'), ['A'])

Upvotes: 3

Related Questions