Reputation: 79
I have a csv file that contains (FileName,ColumnName,Rule and RuleDetails) as headers.
I have multiple rules for column Rule like NotNull,Max,Min etc. For the rule "Unique" there can be multiple columns, I need to pass those columns and perform countDistinct.
If I pass the values dynamically instead of hardcoding I'm getting below error
AnalysisException: Column '`"SITEID", "ASSETNUM"`' does not exist. Did you mean one of the following? [spark_catalog.maximo_dq.Assets_new.ASSETNUM, spark_catalog.maximo_dq.Assets_new.HasLD, spark_catalog.maximo_dq.Assets_new.SITEID, spark_catalog.maximo_dq.Assets_new.Status, spark_catalog.maximo_dq.Assets_new.SerialNumber, spark_catalog.maximo_dq.Assets_new.Description, spark_catalog.maximo_dq.Assets_new.InstallDate, spark_catalog.maximo_dq.Assets_new.Classification, spark_catalog.maximo_dq.Assets_new.LongDescription];
I need to take check how many records in INSTALLDATE are not in the format of RuleDetails
Upvotes: 0
Views: 71
Reputation: 6748
Use tuple unpacking
to pass the values
UNIQUUECOLSString = ['a','b','c'] #keep it in an array
df.select(countDistinct( *UNIQUUECOLSString ))
Upvotes: 1