pyspark: counting number of occurrences of each distinct values

Question

I think the question is related to: Spark DataFrame: count distinct values of every column

So basically I have a spark dataframe, with column A has values of 1,1,2,2,1

So I want to count how many times each distinct value (in this case, 1 and 2) appears in the column A, and print something like

distinct_values | number_of_apperance
1 | 3
2 | 2

cronoik · Accepted Answer

I just post this as I think the other answer with the alias could be confusing. What you need are the groupby and the count methods:

from pyspark.sql.types import *
l = [
1
,1
,2
,2
,1
]

df = spark.createDataFrame(l, IntegerType())
df.groupBy('value').count().show()

+-----+-----+ 
|value|count| 
+-----+-----+ 
|    1|    3|
|    2|    2| 
+-----+-----+

pyspark: counting number of occurrences of each distinct values

Answers (2)

Related Questions