Reputation: 128
I have a question. How do I check if a python string contains chars which are not in a given list?
Here is the list (set):
set("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._")
Upvotes: 4
Views: 5577
Reputation: 27360
Comparing the runtime of the different solutions:
import timeit
search_strings = [
'"#12"', # short string, early match
'"#1234567"', # longer string, early match
'"1234567#"', # longer string, late match
'"123" * 100 + "#"', # long string, late match
'"123#" * 100', # long string early match
]
algorithms = [
("r.search(s)", 's={};import re; r = re.compile(r"[^-.\w]")'),
("set(s) - SET", 's={};SET=frozenset("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._")'),
("any(x not in SET for x in s)", 's={};SET=frozenset("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._")'),
("SET.issuperset(s)", 's={};SET=frozenset("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._")'),
]
for alg, setup in algorithms:
print alg
for sstr in search_strings:
print "%35s %.3f" % (sstr[:35], timeit.timeit(alg, setup.format(sstr)))
which gives the following output on my machine:
r.search(s)
"#12" 0.470
"#1234567" 0.514
"1234567#" 0.572
"123" * 100 + "#" 3.493
"123#" * 100 0.502
set(s) - SET
"#12" 0.566
"#1234567" 1.045
"1234567#" 1.075
"123" * 100 + "#" 7.658
"123#" * 100 10.170
any(x not in SET for x in s)
"#12" 0.786
"#1234567" 0.797
"1234567#" 1.475
"123" * 100 + "#" 27.266
"123#" * 100 1.087
SET.issuperset(s)
"#12" 0.466
"#1234567" 0.864
"1234567#" 0.896
"123" * 100 + "#" 7.512
"123#" * 100 10.199
we see that the regex solution is the fastest.
Upvotes: 1
Reputation: 51162
You want to test whether the characters in the string are not a subset of the given set of characters. That is straightforward in Python because the <=
operator tests if one set is a subset of another.
import string
# don't use a mutable set for this purpose
GIVEN = frozenset(string.ascii_letters + string.digits + '-._')
def uses_other_chars(s, given=GIVEN):
return not set(s) <= given
Examples:
>>> uses_other_chars('abc')
False
>>> uses_other_chars('Hello!')
True
Upvotes: 3
Reputation: 89
I always defer to regular expressions when validating strings.
To create a set, you enclose all characters in the set in []
.
To check if a string contains any character not in a set, add ^
to the beginning.
To check if the string contains one or more members of a set, append +
.
Given this information, a regular expression to check if a string contains any characters other than {a,b,c,d} would look like this:
[^abcd]+
(note that this is case sensitive)
To use regular expressions in python, import re
. The re.search(pattern, string, flags=0)
method will look through the entire string for the pattern you give.
More information on regular expressions in python can be found here. A simple regular expression tester can be found here.
Upvotes: 6
Reputation: 3519
any
to check in SET for each character of stringSET = set("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._")
s = "123#"
print(any(x not in SET for x in s))
Upvotes: 1