Reputation: 1
I have just started learning SQL and Apache Spark.
I have imported a SQL table inside Spark.
Now I need to find a success rate based on a field which needs to be 'yes'.
so I need to find total number of rows divided by number of rows which has a specific field as 'yes'
I was able to find the results separately but didn't know how to combine these two queries .
sqlContext.sql("select count(*) from customers")
res51: org.apache.spark.sql.DataFrame = [_c0: bigint]
sqlContext.sql("select count(*) from customers where custSub = 'yes'")
res52: org.apache.spark.sql.DataFrame = [_c0: bigint]
Can I find out the result using a single query or do I need to perform any operation after storing the results of the individual queries .
Can you please help me out with this?
Upvotes: 0
Views: 362
Reputation: 1269873
Here is a nice little trick to get the rate using avg()
:
select avg(case when custSub = 'yes' then 1.0 else 0.0 end) as rate
from customers;
Upvotes: 0
Reputation: 49260
You can use conditional aggregation to do this.
sqlContext.sql("""select count(case when custSub = 'yes' then 1 end)/count(*)
from customers
""")
Upvotes: 1