pyspark how to pass the values dynamically to countDistinct

Question

I have a csv file that contains (FileName,ColumnName,Rule and RuleDetails) as headers.

I have multiple rules for column Rule like NotNull,Max,Min etc. For the rule "Unique" there can be multiple columns, I need to pass those columns and perform countDistinct.

If I pass the values dynamically instead of hardcoding I'm getting below error

AnalysisException: Column '`"SITEID", "ASSETNUM"`' does not exist. Did you mean one of the following? [spark_catalog.maximo_dq.Assets_new.ASSETNUM, spark_catalog.maximo_dq.Assets_new.HasLD, spark_catalog.maximo_dq.Assets_new.SITEID, spark_catalog.maximo_dq.Assets_new.Status, spark_catalog.maximo_dq.Assets_new.SerialNumber, spark_catalog.maximo_dq.Assets_new.Description, spark_catalog.maximo_dq.Assets_new.InstallDate, spark_catalog.maximo_dq.Assets_new.Classification, spark_catalog.maximo_dq.Assets_new.LongDescription];

Similarly how to get the count of records which are not matching the specified date format.

I need to take check how many records in INSTALLDATE are not in the format of RuleDetails

pyspark how to pass the values dynamically to countDistinct

Answers (1)

Related Questions