Reputation: 4565
I am trying, but struggling, to write an algorithm that checks if a substring exists in a piece of text. The piece of text can include punctuation, but only alphanumeric characters should be considered when searching for the substring.
I would like to return the start and end index of the substring. It is guaranteed that the substring exists in the text. However, these indexes should also account for punctuation that was ignored in the search.
For example, for the text BD ACA;B_ 1E
and the substring AB1
, the algorithm should return 5
and 11
as the start and end index. (text[5:11]
-> A;B_ 1
== AB1
with punctuation removed.)
This is the best I have done so far.
def search(text, sub):
print(text, sub)
if not sub:
return True
for i, char in enumerate(text):
if char == sub[0]:
return search(text[1:], sub[1:])
else:
return search(text[1:], sub)
result = search("BD ACA;B_ 1E", "AB1")
print(result)
Upvotes: 0
Views: 54
Reputation: 5668
There is a string function isalnum() that can check for alphanumeric chars.
t = 'BD ACA;B_ 1'
s = 'AB1'
def is_in_st(t, s):
start = t.rfind(s[0])
end = t.rfind(s[-1]) + 1
if start and end:
return s == ''.join([c for c in t[start:end] if c.isalnum()])
is_in_st(t, s)
True
Upvotes: 0
Reputation: 351084
You can use a regular expression for that. [\W_]*
will match any non-alphanumeric character sequence, so you could alternate the letters of the search string with that pattern:
import re
def search(text, sub):
match = re.search(r"[\W_]*".join(sub), text)
return match and match.span(0)
text = "BD ACA;B_ 1E"
span = search(text, "AB1")
if span:
start, end = span
print(start, end)
print(text[start:end])
Upvotes: 2