mon
mon

Reputation: 22254

Spark - How to word count without RDD

It looks RDD is to be removed from Spark.

Announcement: DataFrame-based API is primary API

The RDD-based API is expected to be removed in Spark 3.0

Then, how to implement programs like word count in Spark?

Upvotes: 1

Views: 156

Answers (1)

ebonnal
ebonnal

Reputation: 1167

The data you manipulate as tuples using RDD api can be thought of and manipulated as columns/fields in a SQL like manner using DataFrame api.

df.withColumn("word", explode(split(col("lines"), " ")))
  .groupBy("word")
  .count()
  .orderBy(col("count").desc())
  .show()
+---------+-----+
|     word|count|
+---------+-----+
|      foo|    5|
|      bar|    2|
|     toto|    1|
...
+---------+-----+

Notes:

  • This code snippet requires necessary imports from org.apache.spark.sql.functions
  • Relevant examples can be found in this question's answers.

Upvotes: 1

Related Questions