Amit Kumar
Amit Kumar

Reputation: 905

How to get file of RDD in spark

I am playing with spark RDD with json files and i am doing something like below

val uisJson5 = sqlContext.read.json(
     sc.textFile("s3n://localtion/*")
        .filter(line =>
         line.contains("\"xyz\":\"A\"")
         && line.contains("\"id\":\"adasdfasdfasd\"")
     ))
uisJson5.show()

I want to know the source json files as well from where the results are coming. Is there any way i can do this?

Edit:

I was able to do it using below code

val uisJson1 = sc.textFile("s3n://localtion/*”) 
.filter(line => line.contains("\"xyz\":\"A\"")
&& line.contains("\"id\":\"adasdfasdfasd\""))

uisJson1.collect().foreach(println)

Upvotes: 0

Views: 320

Answers (1)

eliasah
eliasah

Reputation: 40380

You are looking for wholeTextFiles along with flatMapValues.

wholeTextFiles lets you read a directory containing multiple small text files, and returns each of them as (filename, content) pairs. This is in contrast with textFile, which would return one record per line in each file.

Upvotes: 2

Related Questions