Reputation: 325
entries = sc.textFile(...).map(lambda line: line.split("\t")).map(lambda row:(int(row[0]),row[1]))
some_set = set()
for entry in entries.collect():
some_set.add(entry[1])
Is there a better way to do the above. I just want to get the i-th element of each entry.
Upvotes: 0
Views: 38
Reputation: 1712
So basically what you describe is:
set(entries.keys().distinct().collect())
or generalized
set(entries.map(operator.itemgetter(i)).distinct().collect())
Upvotes: 1