Reputation: 5498
I have a dataframe as below
root
|-- tasin: string (nullable = true)
|-- advertiser_id: decimal(38,10) (nullable = true)
|-- predicted_sp_sold_units: decimal(38,10) (nullable = true)
|-- predicted_sp_impressions: decimal(38,10) (nullable = true)
|-- predicted_sp_clicks: decimal(38,10) (nullable = true)
|-- predicted_sdc_sold_units: decimal(38,10) (nullable = true)
|-- predicted_sdc_impressions: decimal(38,10) (nullable = true)
|-- predicted_sdc_clicks: decimal(38,10) (nullable = true)
|-- predicted_sda_sold_units: decimal(38,10) (nullable = true)
|-- predicted_sda_impressions: decimal(38,10) (nullable = true)
|-- predicted_sda_clicks: decimal(38,10) (nullable = true)
|-- region_id: integer (nullable = true)
|-- marketplace_id: integer (nullable = true)
|-- dataset_date: date (nullable = true)
Now I am using the below select statement. I am looking for presence of a column name and if present select the value or else fill with Null. The dataframe is stored in df variable.
scores_df1 = df.select(
col('marketplace_id'),
col('region_id'),
col('tasin'),
col('advertiser_id'),
col('predicted_sp_sold_units'),
col('predicted_sp_impressions'),
col('predicted_sp_clicks'),
col('predicted_sdc_sold_units'),
col('predicted_sdc_impressions'),
col('predicted_sdc_clicks'),
col('predicted_sda_sold_units'),
col('predicted_sda_impressions'),
col('predicted_sda_clicks'),
when('sdcr_score' in df.columns is True, col('sdcr_score')).otherwise(lit(None)).alias('sdcr_score'),
when('sdar_score' in df.columns is True, col('sdar_score')).otherwise(lit(None)).alias('sdar_score')
)
I am receiving error <class 'TypeError'>: condition should be a Column
Please advice what is wrong
Upvotes: 0
Views: 1809
Reputation: 960
the phrase 'sdcr_score' in df.columns is True
is evaluated in Python before moving to spark and return True/False.
So what you are passing to spark is: when(True, ..., ...).
When is expecting the first argument to be a Column that is evaluated to a True/False statement and not a Pythonic Bool.
You can wrap the argument with lit()
function which will basically pass a True/False column to all arguments of the when clause.
Upvotes: 1