Reputation: 1753
I am trying to learn PySpark. I must left join two dataframes, let's say A
and B
, on the basis of the respective columns colname_a
and colname_b
. Normally, I would do it like this:
# create a new dataframe AB:
AB = A.join(B, A.colname_a == B.colname_b, how = 'left')
However, the names of the columns are not directly available for me. They have been stored in a specific module, and I must call them like this:
module.COLNAME_A # contains string with colname of A
module.COLNAME_B # contains string with colname of B
How can I put these string values into the command above, in order to join the dataframes?
Upvotes: 1
Views: 2855
Reputation: 214987
Use square bracket instead of dot notation to access the column names:
AB = A.join(B, A[module.COLNAME_A] == B[module.COLNAME_B], how = 'left')
Upvotes: 5