Reputation: 3355
So for a single word substring count in some text, I can use some_text.split().count(single_word_substring)
. How can I do that for a multi-word substring count in some text?
Examples:
text = 'he is going to school. abc is going to school. xyz is going to school.'
to_be_found = 'going to school'
count should be 3.
text = 'he is going to school. abc is going to school. xyz is going to school.'
to_be_found = 'going to'
count should be 3.
text = 'he is going to school. abc is going to school. xyz is going to school.'
to_be_found = 'go'
count should be 0.
text = 'he is going to school. abc-xyz is going to school. xyz is going to school.'
to_be_found = 'school'
count should be 3.
text = 'he is going to school. abc-xyz is going to school. xyz is going to school.'
to_be_found = 'abc-xyz'
count should be 1.
Assumption 1: Everything is lower-case.
Assumption 2: The text can contain anything.
Assumption 3: The to be found can contain anything too. For example, car with 4 passengers
, xyz & abc
, etc.
NOTE: REGEX based solutions are acceptable. I am just curious if it's possible without regex (nice to have and just for others who may be interested in this in future).
Upvotes: 4
Views: 123
Reputation: 339
Here's a working solution using regex:
import re
def occurrences(text,to_be_found):
return len(re.findall(rf'\W{to_be_found}\W', text))
The capital W in regex is for non-word characters, which covers spaces and other punctuation.
Upvotes: 1
Reputation: 2132
Manage to make it work with this code (but it is not in Pythonic way at all):
text = 'he is going to school. abc is going to school. xyz is going to school.'
to_be_found = 'going to school'
def find_occurences(text, look_for):
spec = [',','.','!','?']
where = 0
how_many = 0
if not to_be_found in text:
return how_many
while True:
i = text.find(look_for, where)
if i != -1: #We have a match
if (((text[i-1] == " ") and (text[i + len(look_for)] == " ")) #Check if the text is really alone
or (((text[i-1] in spec) or ((text[i-1] == " "))) and (text[i + len(look_for)] in spec))): #Check if it is not surrounded by special characters such as ,.!?
where = i + len(look_for)
how_many += 1
else:
where = i + len(look_for)
else:
break
return how_many
print("'{}' was in '{}' this many times: {}".format(to_be_found, text, find_occurences(text, to_be_found)))
(text[i-1] == " ") and (text[i + len(look_for)] == " ")
checks if the substring is not surrounded by white spaces.((text[i-1] in spec) or ((text[i-1] == " "))) and (text[i + len(look_for)] in spec))
checks if the substring isn't surrounded by any special characters and white space from the left.Example 1:
to_be_found = 'going to school'
Output1: 3
Example 2:
to_be_found = 'going to'
Output2: 3
Example 3:
to_be_found = 'go'
Output3: 0
Example 4:
to_be_found = 'school'
Output4: 3
Upvotes: 0
Reputation: 1659
you try this :
text = 'he is going to school. abc is going to school. xyz is going to school.'
to_be_found = 'going to school'
i=0
r=0
while True :
if text.find(to_be_found,i) <0 or i>len(text) :
break
elif text.find(to_be_found,i) >= 0 :
r=r+1
i=text.find(to_be_found,i)+len(to_be_found)
print(r)
Upvotes: 0
Reputation: 21
the best native way to search substring is still count. it can be used with multi-word substrings as you need
text = 'he is going to school. abc is going to school. xyz is going to school.'
text.count('going to school') # 3
text.count('going to') # 3
text.count('school') # 3
text.count('go') # 3
for case 'go' if you need 0 you can search 'go ',' go' or ' go ' to catch separate word
also you can write your own method to search by characters https://stackoverflow.com/a/30863956/15080484
Upvotes: 0