Reputation: 85
Lets say I have a RDD like
[(u'Some1', (u'ABC', 9989)),
(u'Some2', (u'XYZ', 235)),
(u'Some3', (u'BBB', 5379)),
(u'Some4', (u'ABC', 5379))]
I am using map
to get one tuple at a time but how can I access to individual element of a tuple like to see if a tuple contains some character. Actually I want to filter out those that contains some character. Here the tuples that contain ABC
I was trying to do something like this but its not helping
def foo(line):
if(line[1]=="ABC"):
return (line)
new_data = data.map(foo)
I am new to spark and python as well please help!!
Upvotes: 6
Views: 8705
Reputation: 11573
RDDs can be filtered directly. Below will give you all records that contain "ABC" in the 0th position of the 2nd element of the tuple.
new_data = data.filter(lambda x: x[1][0] == "ABC")
Upvotes: 6