count and distinct count without groupby using PySpark

Question

I have a dataframe (testdf) and would like to get count and distinct count on a column (memid) where another column (booking/rental) is not null or not empty (ie."")

testdf:

memid   booking  rental
100        Y 
100
120        Y
100        Y       Y

Expected result: (for booking column not null/ not empty)

count(memid)  count(distinct memid)
      3                      2

If it was SQL:

Select count(memid), count(distinct memid) from mydf 
where booking is not null and booking!= ""

In PySpark:

mydf.filter("booking!=''").groupBy('booking').agg(count("patid"), countDistinct("patid"))

But I just want the overall counts and not have it grouped by..

count and distinct count without groupby using PySpark

Answers (1)

Related Questions