pg2455
pg2455

Reputation: 5168

Convert an RDD to iterable: PySpark?

I have an RDD which I am creating by loading a text file and preprocessing it. I dont want to collect it and save it to the disk or memory(entire data) but rather want to pass it to some other function in python which consumes data one after the other is form of iterable.

How is this possible?

data =  sc.textFile('file.txt').map(lambda x: some_func(x))

an_iterable = data. ##  what should I do here to make it give me one element at a time?
def model1(an_iterable):
 for i in an_iterable:
  do_that(i)

model(an_iterable)

Upvotes: 15

Views: 20305

Answers (2)

Abdalla Issa Mbaideen
Abdalla Issa Mbaideen

Reputation: 11

data =  sc.textFile('file.txt').map(lambda x: some_func(x))
# you need to call RDD method() then loop
for i in data.collect():
  print i

Upvotes: -2

danf1024
danf1024

Reputation: 421

I believe what you want is toLocalIterator():

Upvotes: 20

Related Questions