Matix
Matix

Reputation: 128

How to check if string contains chars which are not in a list?

I have a question. How do I check if a python string contains chars which are not in a given list?

Here is the list (set):

set("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._")

Upvotes: 4

Views: 5577

Answers (4)

thebjorn
thebjorn

Reputation: 27360

Comparing the runtime of the different solutions:

import timeit

search_strings = [
    '"#12"',                     # short string, early match
    '"#1234567"',                # longer string, early match
    '"1234567#"',                # longer string, late match
    '"123" * 100 + "#"',         # long string, late match
    '"123#" * 100',              # long string early match
]

algorithms = [
    ("r.search(s)", 's={};import re; r = re.compile(r"[^-.\w]")'),
    ("set(s) - SET", 's={};SET=frozenset("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._")'),
    ("any(x not in SET for x in s)", 's={};SET=frozenset("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._")'),
    ("SET.issuperset(s)", 's={};SET=frozenset("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._")'),
]

for alg, setup in algorithms:
    print alg
    for sstr in search_strings:
        print "%35s %.3f" % (sstr[:35], timeit.timeit(alg, setup.format(sstr)))

which gives the following output on my machine:

r.search(s)
                              "#12" 0.470
                         "#1234567" 0.514
                         "1234567#" 0.572
                  "123" * 100 + "#" 3.493
                       "123#" * 100 0.502
set(s) - SET
                              "#12" 0.566
                         "#1234567" 1.045
                         "1234567#" 1.075
                  "123" * 100 + "#" 7.658
                       "123#" * 100 10.170
any(x not in SET for x in s)
                              "#12" 0.786
                         "#1234567" 0.797
                         "1234567#" 1.475
                  "123" * 100 + "#" 27.266
                       "123#" * 100 1.087
SET.issuperset(s)
                              "#12" 0.466
                         "#1234567" 0.864
                         "1234567#" 0.896
                  "123" * 100 + "#" 7.512
                       "123#" * 100 10.199

we see that the regex solution is the fastest.

Upvotes: 1

kaya3
kaya3

Reputation: 51162

You want to test whether the characters in the string are not a subset of the given set of characters. That is straightforward in Python because the <= operator tests if one set is a subset of another.

import string

# don't use a mutable set for this purpose
GIVEN = frozenset(string.ascii_letters + string.digits + '-._')

def uses_other_chars(s, given=GIVEN):
    return not set(s) <= given

Examples:

>>> uses_other_chars('abc')
False
>>> uses_other_chars('Hello!')
True

Upvotes: 3

Seivnekn
Seivnekn

Reputation: 89

I always defer to regular expressions when validating strings.

To create a set, you enclose all characters in the set in []. To check if a string contains any character not in a set, add ^ to the beginning. To check if the string contains one or more members of a set, append +.

Given this information, a regular expression to check if a string contains any characters other than {a,b,c,d} would look like this:

[^abcd]+ (note that this is case sensitive)

To use regular expressions in python, import re. The re.search(pattern, string, flags=0) method will look through the entire string for the pattern you give.

More information on regular expressions in python can be found here. A simple regular expression tester can be found here.

Upvotes: 6

Poojan
Poojan

Reputation: 3519

  • use any to check in SET for each character of string
SET = set("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._")
s = "123#"
print(any(x not in SET for x in s))

Upvotes: 1

Related Questions