Reputation: 354
I have two dataframes: left_df and right_df that have common columns to join on: ['col_1, 'col_2']
, and I want to join onto another condition: right_df.col_3.between(left_df.col_4, left_df.col_5)]
Code:
from pyspark.sql import functions as F
join_condition = ['col_1',
'col_2',
right_df.col_3.between(left_df.col_4, left_df.col_5)]
df = left_df.join(right_df, on=join_condition, how='left')
df.write.parquet('/tmp/my_df')
But I got the error below:
TypeError: Column is not iterable
Why I can't add those 3 conditions together?
Upvotes: 3
Views: 2155
Reputation: 42332
You cannot mix strings with Columns. The expressions must be a list of strings or a list of Columns, not a mixture of both. You can convert the first two items to a column expression instead, e.g.
from pyspark.sql import functions as F
join_condition = [left_df.col_1 == right_df.col_1,
left_df.col_2 == right_df.col_2,
right_df.col_3.between(left_df.col_4, left_df.col_5)]
df = left_df.join(right_df, on=join_condition, how='left')
Upvotes: 3