Reputation: 7255
I have a script which is currently running in sample dataframe. Here's my code:
fd_orion_apps = fd_orion_apps.groupBy('msisdn', 'apps_id').pivot('apps_id').count().select('msisdn', *parameter_cut.columns).fillna(0)
After pivoting, probably some columns in parameter_cut
are not available on df fd_orion_apps
, and it will give the error like this:
Py4JJavaError: An error occurred while calling o1381.select.
: org.apache.spark.sql.AnalysisException: cannot resolve '`8602`' given input columns: [7537, 7011, 2658, 3582, 12120, 31049, 35010, 16615, 10003, 15067, 1914, 1436, 6032, 422, 10636, 10388, 877,...
Upvotes: 1
Views: 311
Reputation: 24386
You can separate the select
into a different step. Then you will be able to use a conditional expression together with list comprehension.
from pyspark.sql import functions as F
fd_orion_apps = fd_orion_apps.groupBy('msisdn', 'apps_id').pivot('apps_id').count()
fd_orion_apps = fd_orion_apps.select(
'msisdn',
*[c if c in fd_orion_apps.columns else F.lit(0).alias(c) for c in parameter_cut.columns]
)
Upvotes: 1