Leevo
Leevo

Reputation: 1753

PySpark: How to join dataframes with column names stored in other variables

I am trying to learn PySpark. I must left join two dataframes, let's say A and B, on the basis of the respective columns colname_a and colname_b. Normally, I would do it like this:

# create a new dataframe AB:
AB = A.join(B, A.colname_a == B.colname_b, how = 'left')

However, the names of the columns are not directly available for me. They have been stored in a specific module, and I must call them like this:

module.COLNAME_A   # contains string with colname of A
module.COLNAME_B   # contains string with colname of B

How can I put these string values into the command above, in order to join the dataframes?

Upvotes: 1

Views: 2855

Answers (1)

akuiper
akuiper

Reputation: 214987

Use square bracket instead of dot notation to access the column names:

AB = A.join(B, A[module.COLNAME_A] == B[module.COLNAME_B], how = 'left')

Upvotes: 5

Related Questions