Reputation: 905
I am playing with spark RDD with json files and i am doing something like below
val uisJson5 = sqlContext.read.json(
sc.textFile("s3n://localtion/*")
.filter(line =>
line.contains("\"xyz\":\"A\"")
&& line.contains("\"id\":\"adasdfasdfasd\"")
))
uisJson5.show()
I want to know the source json files as well from where the results are coming. Is there any way i can do this?
Edit:
I was able to do it using below code
val uisJson1 = sc.textFile("s3n://localtion/*”)
.filter(line => line.contains("\"xyz\":\"A\"")
&& line.contains("\"id\":\"adasdfasdfasd\""))
uisJson1.collect().foreach(println)
Upvotes: 0
Views: 320
Reputation: 40380
You are looking for wholeTextFiles
along with flatMapValues
.
wholeTextFiles
lets you read a directory containing multiple small text files, and returns each of them as (filename, content) pairs. This is in contrast with textFile, which would return one record per line in each file.
Upvotes: 2