Reputation: 93
I need to be able to see if a string only contains a substring or a letter, and nothing else.
Say I wanted to detect World
This would contain the substring but it also has different letters in a different order
"Hello World"
This doesn't contain any different lettering or order, just the substring 3 times
"WorldWorldWorld"
If I wanted to detect _
This wouldn't pass
"Hello_World"
But this would
"___"
How do I do this?
Upvotes: 1
Views: 2370
Reputation: 12465
Method 1:
Without regular expressions (regexes), one can simply use sets. First, split the string s
in question into substrings of the same length as the substring substr
. Make a set s_set
out of these substrings. If that set has only 1 element, and that element in substr
, then print True
, otherwise False
.
strs = ["WorldWorldWorld", "Hello World"]
substr = "World"
len_substr = len(substr)
for s in strs:
s_set = set(s[i:(i + len_substr)] for i in range(0, len(s), len_substr))
print(len(s_set) == 1 and substr in s_set)
# True
# False
Method 2:
If speed is important, then for very long strings, it makes sense to stop as soon as the first non-matching substring is found, as in this solution:
for s in strs:
only_substr = True
for i in range(0, len(s), len_substr):
cur_substr = s[i:(i + len_substr)]
if cur_substr != substr:
only_substr = False
break
print(only_substr)
# True
# False
Upvotes: 0
Reputation: 363476
No regex necessary. Relying on the fact that str.count
counts non-overlapping occurrences
len(target) * data.count(target) == len(data)
Simple string methods are 400-800% faster than regex here:
>>> import re
>>> target = "World"
>>> data = "World" * 3
>>> pattern = f"^({re.escape(target)})+$"
>>> %timeit len(target) * data.count(target) == len(data)
115 ns ± 0.352 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
>>> %timeit re.match(pattern, data) is not None
456 ns ± 2.88 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
>>> %timeit bool(data.replace(target, '')) # str.replace is faster again
51.7 ns ± 0.269 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
Upvotes: 2
Reputation: 19252
You can use a regular expression, using re.escape
to generate a pattern that matches one or more consecutive occurrences of the target (using ^
and $
to indicate the beginning and end of the string, respectively) as well as re.match
to determine whether it matches the desired pattern:
import re
target = "World"
data = "World" * 3
pattern = f"^({re.escape(target)})+$"
re.match(pattern, data) is not None
This outputs:
True
Upvotes: 1
Reputation: 5315
This is a job for regular expressions, re.match()
.
import re
re.match(r"(?:World)+", "World")
re.match(r"(?:World)+", "Hello World")
re.match(r"(?:World)+", "WorldWorldWorld")
Upvotes: -1
Reputation: 532303
Use a regular expression.
if re.match("(?:World)+", s):
This only succeeds if s
contains one or more repetitions of the string World
, and nothing else.
Upvotes: -1