Reputation: 59
I'm wondering how to detect if two substrings match a main string in a specific order. For example if we're looking for "hours"
and then "minutes"
anywhere at all in a string, and the string is "what is 5 hours in minutes"
, it would return true
. If the string was "what is 5 minutes in hours"
, it would return false
.
Upvotes: 2
Views: 83
Reputation: 180391
s = "what is 5 hours in minutes"
a, b = s.find("hours"),s.find("minutes")
print(-1 < a < b)
You could also avoid checking for b if a does not exist in the string:
def inds(s, s1, s2):
a = s.find(s1)
return -1 < a < s.find(s2)
If you want to start at a + 1 it is trivial to change:
def inds(s, s1, s2):
a = s.find(s1)
return -1 < a < s.find(s2, a+1)
But if you always want to make sure that a comes before b then stick to the first solutions. You also did not say if sub strings can be matched i.e:
a = "foo"
b = "bar"
Would match:
"foobar"
But they are not actual words in the string. If you want to match actual words then you will either need to split and clean the text or use word boundaries with a regex.
If you want to match exact words and not partial matches then use a regex using word boundaries:
import re
def consec(s, *args):
if not args:
raise ValueError("args cannot be empty")
it = iter(args)
prev = re.search(r"\b{}\b".format(next(it)), s)
if not prev:
return False
prev = prev.end()
for w in args:
ind = re.search(r"\b{}\b".format(w), s, prev + 1)
if not ind:
return False
prev = ind.end()
return True
Which won't match "foo" and "bar" in foobar:
In [9]: consec("foobar","foo","bar")
Out[9]: False
In [10]: consec("foobar bar for bar","foo","bar")
Out[10]: False
In [11]: consec("foobar bar foo bar","foo","bar")
Out[11]: True
In [12]: consec("foobar","foo","bar")
Out[12]: False
In [13]: consec("foobar bar foo bar","foo","bar")
Out[13]: True
In [14]: consec("","foo","bar")
Out[14]: False
In [15]: consec("foobar bar foo bar","foobar","foo","bar")
Out[15]: True
Upvotes: 2
Reputation: 2662
A regex will work well here. The regex r"hours.*minutes" says look for hours followed but 0 or more of any characters followed by minutes. Also, make sure to use the search
function in the regex library rather than match
, as match checks the from the beginning of the string.
import re
true_state ="what is 5 hours in minutes"
false_state = "what is 5 minutes in hours"
pat = re.compile(r"hours.*minutes")
statements = [true_state, false_state]
for state in statements:
ans= re.search(pat, state)
if ans:
print state
print ans.group()
what is 5 hours in minutes
hours in minutes
Upvotes: 0
Reputation: 3731
This will work with any set of words and any string:
def containsInOrder(s, *words):
last = -1
for word in words:
last = s.find(word, last + 1)
if last == -1:
return False
return True
Used like so:
>>> s = 'what is 5 hours in minutes'
>>> containsInOrder(s, 'hours', 'minutes')
True
>>> containsInOrder(s, 'minutes', 'hours')
False
>>> containsInOrder(s, '5', 'hours', 'minutes')
True
>>> containsInOrder('minutes hours minutes', 'hours', 'minutes')
True
>>> containsInOrder('minutes hours minutes', 'minutes', 'hours')
True
Upvotes: 1
Reputation: 86
if index(a) < index(b):
True
else:
This
Use the index method to determine which one comes first. The if statement gives a conditional as to what you do once you find out which comes first. Do you understand what I'm trying to say?
Upvotes: 0
Reputation: 33285
You could use a regular expression such as "hours.*minutes", or you could use a simple string search that looks for "hours", notes the location where it is found, then does another search for "minutes" starting at that location.
Upvotes: 0