Reputation: 51
I have a DF dataframe which has below values
Email NAME DESIGNATION
MATCHED MISMATCHED MISMATCHED
MATCHED MISMATCHED MATCHED
I was able to find out the columns in DF dataframe using val comparecols = Df.columns
I want to find out those columns which has value MISMATCHED. Here in my example NAME and DESIGNATION. Since my dataframe is no consistent I am not sure how shall I use map and filter together
I have tried below
for (name <- comparecols){
df =df.filter(s"$name ='MATCHED'")
But its giving me result row wise, i.e. its checking row at a time and giving all the columns.
However I want to check column wise.
Upvotes: 1
Views: 201
Reputation: 42392
You can do a conditional sum/count:
val df2 = df.select(
df.columns.map(
c => sum(when(col(c) === "MISMATCHED", 1).otherwise(0)).as(c)
): _*
)
df2.show
+-----+----+-----------+
|Email|NAME|DESIGNATION|
+-----+----+-----------+
| 0| 2| 1|
+-----+----+-----------+
If you want to get the column names, you can do
val colnames = df2.select(
arrays_zip(array(df.columns.map(lit):_*), array(df.columns.map(col):_*)).as("c")
).selectExpr("inline(c) as (col, val)").filter("val != 0").select("col").rdd.map(_.getString(0)).collect
Upvotes: 2