Florian
Florian

Reputation: 354

Pyspark join with mixed conditions

I have two dataframes: left_df and right_df that have common columns to join on: ['col_1, 'col_2'] , and I want to join onto another condition: right_df.col_3.between(left_df.col_4, left_df.col_5)]

Code:

from pyspark.sql import functions as F

join_condition = ['col_1', 
                  'col_2', 
                  right_df.col_3.between(left_df.col_4, left_df.col_5)]
df = left_df.join(right_df, on=join_condition, how='left')

df.write.parquet('/tmp/my_df')

But I got the error below:

TypeError: Column is not iterable

Why I can't add those 3 conditions together?

Upvotes: 3

Views: 2155

Answers (1)

mck
mck

Reputation: 42332

You cannot mix strings with Columns. The expressions must be a list of strings or a list of Columns, not a mixture of both. You can convert the first two items to a column expression instead, e.g.

from pyspark.sql import functions as F

join_condition = [left_df.col_1 == right_df.col_1, 
                  left_df.col_2 == right_df.col_2, 
                  right_df.col_3.between(left_df.col_4, left_df.col_5)]

df = left_df.join(right_df, on=join_condition, how='left')

Upvotes: 3

Related Questions