Reputation: 171
I have a DataFrame with a column "age" and I want to count how many rows with age = 60, for example. I know how to solve this using select or df.count() but I want to use selectExpr.
I tried
customerDfwithAge.selectExpr("count(when(col(age) = 60))")
but it returns me
Undefined function: 'col'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.;
If I try to remove col, it returns me
Invalid arguments for function when; line 1 pos 6
What is wrong?
Upvotes: 0
Views: 1428
Reputation: 53
If you want to use selectExpr
you need to provide a valid SQL expression.
when()
and col()
are pyspark.sql.functions
not SQL expressions.
In your case, you should try:
customerDfwithAge.selectExpr("sum(case when age = 60 then 1 else 0 end)")
Bear in mind that I am using sum
not count
. count
will count every row (0
s and 1
s) and it would simply return the total number of rows of your dataframe.
Upvotes: 1