Nicholas Picklas
Nicholas Picklas

Reputation: 93

How to see if a string ONLY contains a substring in python

I need to be able to see if a string only contains a substring or a letter, and nothing else.

Say I wanted to detect World

This would contain the substring but it also has different letters in a different order

"Hello World"

This doesn't contain any different lettering or order, just the substring 3 times

"WorldWorldWorld"

If I wanted to detect _

This wouldn't pass

"Hello_World"

But this would

"___"

How do I do this?

Upvotes: 1

Views: 2370

Answers (5)

Timur Shtatland
Timur Shtatland

Reputation: 12465

Method 1:

Without regular expressions (regexes), one can simply use sets. First, split the string s in question into substrings of the same length as the substring substr. Make a set s_set out of these substrings. If that set has only 1 element, and that element in substr, then print True, otherwise False.

strs = ["WorldWorldWorld", "Hello World"]
substr = "World"
len_substr = len(substr)

for s in strs:
    s_set = set(s[i:(i + len_substr)] for i in range(0, len(s), len_substr))
    print(len(s_set) == 1 and substr in s_set)
# True
# False

Method 2:

If speed is important, then for very long strings, it makes sense to stop as soon as the first non-matching substring is found, as in this solution:

for s in strs:
    only_substr = True
    for i in range(0, len(s), len_substr):
        cur_substr = s[i:(i + len_substr)]
        if cur_substr != substr:
            only_substr = False
            break
    print(only_substr)
# True
# False

Upvotes: 0

wim
wim

Reputation: 363476

No regex necessary. Relying on the fact that str.count counts non-overlapping occurrences

len(target) * data.count(target) == len(data)

Simple string methods are 400-800% faster than regex here:

>>> import re
>>> target = "World"
>>> data = "World" * 3
>>> pattern = f"^({re.escape(target)})+$"
>>> %timeit len(target) * data.count(target) == len(data)
115 ns ± 0.352 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> %timeit re.match(pattern, data) is not None
456 ns ± 2.88 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
>>> %timeit bool(data.replace(target, ''))  # str.replace is faster again
51.7 ns ± 0.269 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

Upvotes: 2

BrokenBenchmark
BrokenBenchmark

Reputation: 19252

You can use a regular expression, using re.escape to generate a pattern that matches one or more consecutive occurrences of the target (using ^ and $ to indicate the beginning and end of the string, respectively) as well as re.match to determine whether it matches the desired pattern:

import re

target = "World"
data = "World" * 3

pattern = f"^({re.escape(target)})+$"
re.match(pattern, data) is not None

This outputs:

True

Upvotes: 1

ljmc
ljmc

Reputation: 5315

This is a job for regular expressions, re.match().

import re

re.match(r"(?:World)+", "World")
re.match(r"(?:World)+", "Hello World")
re.match(r"(?:World)+", "WorldWorldWorld")

Upvotes: -1

chepner
chepner

Reputation: 532303

Use a regular expression.

if re.match("(?:World)+", s):

This only succeeds if s contains one or more repetitions of the string World, and nothing else.

Upvotes: -1

Related Questions