Reputation: 22254
It looks RDD is to be removed from Spark.
Announcement: DataFrame-based API is primary API
The RDD-based API is expected to be removed in Spark 3.0
Then, how to implement programs like word count in Spark?
Upvotes: 1
Views: 156
Reputation: 1167
The data you manipulate as tuples using RDD api can be thought of and manipulated as columns/fields in a SQL like manner using DataFrame api.
df.withColumn("word", explode(split(col("lines"), " ")))
.groupBy("word")
.count()
.orderBy(col("count").desc())
.show()
+---------+-----+
| word|count|
+---------+-----+
| foo| 5|
| bar| 2|
| toto| 1|
...
+---------+-----+
Notes:
org.apache.spark.sql.functions
Upvotes: 1