Reputation: 923
I get this
final = ta.join(tb, on=['ID'], how='left')
both left an right have a 'ID' column of the same name.
And I get this
final = ta.join(tb, ta.leftColName == tb.rightColName, how='left')
The left & right column names are known before runtime so the column names can be hard coded.
But what if the left and right column names of the on predicate are different and are calculated/ derived by configuration variables? Such as:
1) leftColName = 'leftKey'
2) rightColName = 'rightKey'
3) final = ta.join(tb, ta.leftColname == tb.rightColname, how='left')
The values of leftColName
& rightColName
are not know before line 3 can be hardcoded and executed.
This doesn't work because I find runtime can intermittently get confused/lost in whether rightColName
refers to ta
or to tb
final = ta.join(tb, f.col(leftColName) == f.col(rightColName), 'left')
Scala appears to have a facility to enable this.
Upvotes: 2
Views: 12937
Reputation: 1
If doing on multiple columns, then following list comprehension works
df1 = df10_fj.select(*(f.col(x).alias(x + "_df10") for x in df10_fj.columns))
df2 = df4_fj.select(*(f.col(x).alias(x + "_df4") for x in df4_fj.columns))
# df1, on=[df1.DOB_df10 == df2.DOB_df4, df1.soundex_lnm_df10 == df2.soundex_lnm_df4]
df = df2.join(df1, on=[df1[f"{c}_df10"] == df2[f"{c}_df4"] for c in join_cols])```
Upvotes: 0
Reputation: 221
You are referencing the column as ta.leftColname
, but - similarly to Pandas - you could also reference it by ta["leftColname"]
.
This way, instead of a hardcoded column name, you can also use a variable. For example:
left_key = 'leftColname'
right_key = 'rightColname'
final = ta.join(tb, ta[left_key] == tb[right_key], how='left')
Upvotes: 11