Reputation: 3335
Is there a way to find out if a string contains any one of the characters in a set with python?
It's straightforward to do it with a single character, but I need to check and see if a string contains any one of a set of bad characters.
Specifically, suppose I have a string:
s = 'amanaplanacanalpanama~012345'
and I want to see if the string contains any vowels:
bad_chars = 'aeiou'
and do this in a for loop for each line in a file:
if [any one or more of the bad_chars] in s:
do something
I am scanning a large file so if there is a faster method to this, that would be ideal. Also, not every bad character has to be checked---so long as one is encountered that is enough to end the search.
I'm not sure if there is a builtin function or easy way to implement this, but I haven't come across anything yet. Any pointers would be much appreciated!
Upvotes: 5
Views: 4169
Reputation: 91092
any((c in badChars) for c in yourString)
or
any((c in yourString) for c in badChars) # extensionally equivalent, slower
or
set(yourString) & set(badChars) # extensionally equivalent, slower
"so long as one is encountered that is enough to end the search." - This will be true if you use the first method.
You say you are concerned with performance: performance should not be an issue unless you are dealing with a huge amount of data. If you encounter issues, you can try:
Regexes
edit Previously I had written a section here on using regexes, via the re
module, programatically generating a regex that consisted of a single character-class [...]
and using .finditer
, with the caveat that putting a simple backslash before everything might not work correctly. Indeed, after testing it, that is the case, and I would definitely not recommend this method. Using this would require reverse engineering the entire (slightly complex) sub-grammar of regex character classes (e.g. you might have characters like \
followed by w
, like ]
or [
, or like -
, and merely escaping some like \w
may give it a new meaning).
Sets
Depending on whether the str.__contains__
operation is O(1) or O(N), it may be justifiable to first convert your text/lines into a set to ensure the in
operation is O(1), if you have many badChars:
badCharSet = set(badChars)
any((c in badChars) for c in yourString)
(it may be possible to make that a one-liner any((c in set(yourString)) for c in badChars)
, depending on how smart the python compiler is)
Do you really need to do this line-by-line?
It may be faster to do this once for the entire file O(#badchars), than once for every line in the file O(#lines*#badchars), though the asymptotic constants may be such that it won't matter.
Upvotes: 9
Reputation: 7271
The following Python code should print out any character in bad_chars if it exists in s:
for i in vowels:
if i in your charset:
#do_something
You could also use the python in-built any using an example like this:
>>> any(e for e in bad_chars if e in s)
True
Upvotes: 0
Reputation: 308206
This regular expression is twice as fast as any
with my minimal testing. You should try it with your own data.
r = re.compile('[aeiou]')
if r.search(s):
# do something
Upvotes: 1
Reputation: 2909
This should be very efficient and clear. It uses sets:
#!/usr/bin/python
bad_chars = set('aeiou')
with open('/etc/passwd', 'r') as file_:
file_string = file_.read()
file_chars = set(file_string)
if file_chars & bad_chars:
print('found something bad')
Upvotes: 2
Reputation: 14116
Use python's any
function.
if any((bad_char in my_string) for bad_char in bad_chars):
# do something
Upvotes: 4