Reputation: 1

Apache Spark : Query for success rate

I have just started learning SQL and Apache Spark.

I have imported a SQL table inside Spark.

Now I need to find a success rate based on a field which needs to be 'yes'.

so I need to find total number of rows divided by number of rows which has a specific field as 'yes'

I was able to find the results separately but didn't know how to combine these two queries .

sqlContext.sql("select count(*) from customers") 

res51: org.apache.spark.sql.DataFrame = [_c0: bigint]

sqlContext.sql("select count(*) from customers where custSub = 'yes'")

res52: org.apache.spark.sql.DataFrame = [_c0: bigint]

Can I find out the result using a single query or do I need to perform any operation after storing the results of the individual queries .

Can you please help me out with this?

Upvotes: 0

Answers (2)

Gordon Linoff

Reputation: 1269873

Here is a nice little trick to get the rate using avg():

select avg(case when custSub = 'yes' then 1.0 else 0.0 end) as rate
from customers;

Upvotes: 0

Vamsi Prabhala

Reputation: 49260

You can use conditional aggregation to do this.

sqlContext.sql("""select count(case when custSub = 'yes' then 1 end)/count(*) 
                  from customers
               """)

Upvotes: 1

Apache Spark : Query for success rate

Answers (2)

Related Questions