Reputation: 303
I have an array containing invalid string
arr_invalid = ['aks', 'rabbbit', 'dog'].
I am parsing through a RDD using lambda function and need to ignore the case if any of this invalid string comes in the input string
like if input string is akss
or aks
ignore both.
How do I achieve this without writing filter for each invalid string?
Upvotes: 1
Views: 4992
Reputation: 180522
You need to compare each string unless the words come sorted, you can use any
to see if any substring is in each string:
arr_invalid = ['aks', 'rabbbit', 'dog']
strings = [ "aks", "akss","foo", "saks"]
filt = list(filter(lambda x: not any(s in x.lower() for s in arr_invalid),strings))
Output:
['foo']
If you only want to exclude the strings if they start with one of the substrings:
t = tuple(arr_invalid)
filt = list(filter(lambda x: not x.lower().startswith(t), strings))
Output:
['foo', 'saks']
If the input is a single string just split:
st = "foo akss saks aks"
t = tuple(arr_invalid)
filt = list(filter(lambda x: not x.startswith(t),st.lower().split()))
You can also just use a list comp:
[s for s in st.lower().split() if not s.startswith(t)]
As poke commented you could find exact matches with a set, you will still need it to combine it with either any and in or str.startswith for matching substrings:
arr_invalid = {'aks', 'rabbbit', 'dog'}
st = "foo akss saks aks"
t = tuple(arr_invalid)
file = list(filter(lambda s: s not in st or not s.startswith(t),st.lower().split())
Upvotes: 3