Abdalla Issa Mbaideen
Abdalla Issa Mbaideen

Reputation: 11

pyspark filtering list from RDD

i have a file names.txt

sample data:

hi hello hey

my name is jack

lets do it

and i have a list

remove = ['it','name']

i created a RDD for the names.txt , i want to filter out any element from it that match a value from the list , the expected results a RDD with one element

hi hello hey

My code:

RDD = sc.textFile("myfiles/names.txt").map(lambda x: x.split())

remove = ['it','name']

result = RDD.filter(lambda X : "remove.values" not in X)

for i in result.collect() : print i

i need to use some kind of iterate method,but doesn't work for me. thanks

Upvotes: 1

Views: 2996

Answers (1)

pault
pault

Reputation: 43504

You can use the builtin all() to filter out cases where any of the bad values match:

result = RDD.filter(lambda X: all(val not in X for val in remove))

Upvotes: 2

Related Questions