melwin_jose
melwin_jose

Reputation: 325

retrieving all the x[i]-th elements of RDD

entries = sc.textFile(...).map(lambda line: line.split("\t")).map(lambda row:(int(row[0]),row[1]))
some_set = set()
for entry in entries.collect():
    some_set.add(entry[1])

Is there a better way to do the above. I just want to get the i-th element of each entry.

Upvotes: 0

Views: 38

Answers (1)

user7337271
user7337271

Reputation: 1712

So basically what you describe is:

set(entries.keys().distinct().collect())

or generalized

set(entries.map(operator.itemgetter(i)).distinct().collect())

Upvotes: 1

Related Questions