QueryQuasar
QueryQuasar

Reputation: 171

pyspark count with condition with selectExpr

I have a DataFrame with a column "age" and I want to count how many rows with age = 60, for example. I know how to solve this using select or df.count() but I want to use selectExpr.

I tried

customerDfwithAge.selectExpr("count(when(col(age) = 60))")

but it returns me

Undefined function: 'col'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.;

If I try to remove col, it returns me

Invalid arguments for function when; line 1 pos 6

What is wrong?

Upvotes: 0

Views: 1428

Answers (1)

Ed_Ab
Ed_Ab

Reputation: 53

If you want to use selectExpr you need to provide a valid SQL expression.

when() and col() are pyspark.sql.functions not SQL expressions.

In your case, you should try:

customerDfwithAge.selectExpr("sum(case when age = 60 then 1 else 0 end)")

Bear in mind that I am using sum not count. count will count every row (0s and 1s) and it would simply return the total number of rows of your dataframe.

Upvotes: 1

Related Questions