Loading JSON dataset into Spark, then use filter, map, etc

Question

I'm new to Apache Spark, and would like to take a dataset saved in JSON (a list of dictionaries), load it into an RDD, then apply operations like filter and map. This seems to me like it should be simple, but after looking around Spark's docs the only thing I found used SQL queries (https://spark.apache.org/docs/1.1.0/sql-programming-guide.html), which is not how I'd like to interact with the RDD.

How can I load a dataset saved in JSON into an RDD? If I missed the relevant documentation, I'd appreciate a link.

Thanks!

Aaron Bannin · Accepted Answer

Have you tried appling json.loads() in the mapping?

import json
f = sc.textFile('/path/to/file')
d = lines.map(lambda line: json.loads(line))

Loading JSON dataset into Spark, then use filter, map, etc

Answers (2)

Related Questions