utengr
utengr

Reputation: 3355

count appearnce of multi-word substring in some text

So for a single word substring count in some text, I can use some_text.split().count(single_word_substring). How can I do that for a multi-word substring count in some text?

Examples:

text = 'he is going to school. abc is going to school. xyz is going to school.'
to_be_found = 'going to school'

count should be 3.

text = 'he is going to school. abc is going to school. xyz is going to school.'
to_be_found = 'going to'

count should be 3.

text = 'he is going to school. abc is going to school. xyz is going to school.'
to_be_found = 'go'

count should be 0.

text = 'he is going to school. abc-xyz is going to school. xyz is going to school.'
to_be_found = 'school'

count should be 3.

text = 'he is going to school. abc-xyz is going to school. xyz is going to school.'
to_be_found = 'abc-xyz'

count should be 1.

Assumption 1: Everything is lower-case. Assumption 2: The text can contain anything. Assumption 3: The to be found can contain anything too. For example, car with 4 passengers, xyz & abc, etc.

NOTE: REGEX based solutions are acceptable. I am just curious if it's possible without regex (nice to have and just for others who may be interested in this in future).

Upvotes: 4

Views: 123

Answers (4)

Byron
Byron

Reputation: 339

Here's a working solution using regex:

import re

def occurrences(text,to_be_found):
    return len(re.findall(rf'\W{to_be_found}\W', text))

The capital W in regex is for non-word characters, which covers spaces and other punctuation.

Upvotes: 1

Jakub Szlaur
Jakub Szlaur

Reputation: 2132

Manage to make it work with this code (but it is not in Pythonic way at all):

text = 'he is going to school. abc is going to school. xyz is going to school.'
to_be_found = 'going to school'

def find_occurences(text, look_for):
    spec = [',','.','!','?']
    where = 0
    how_many = 0

    if not to_be_found in text:
        return how_many

    while True:
        i = text.find(look_for, where)

        if i != -1: #We have a match
            if (((text[i-1] == " ") and (text[i + len(look_for)] == " ")) #Check if the text is really alone
            or (((text[i-1] in spec) or ((text[i-1] == " "))) and (text[i + len(look_for)] in spec))): #Check if it is not surrounded by special characters such as ,.!?

                where = i + len(look_for)
                how_many += 1
            else:
                where = i + len(look_for)
        else:
            break
    
    return how_many

print("'{}' was in '{}' this many times: {}".format(to_be_found, text, find_occurences(text, to_be_found)))
  1. The first condition: (text[i-1] == " ") and (text[i + len(look_for)] == " ") checks if the substring is not surrounded by white spaces.
  2. The second condition: ((text[i-1] in spec) or ((text[i-1] == " "))) and (text[i + len(look_for)] in spec)) checks if the substring isn't surrounded by any special characters and white space from the left.

Example 1:

to_be_found = 'going to school'
Output1: 3

Example 2:

to_be_found = 'going to'
Output2: 3

Example 3:

to_be_found = 'go'
Output3: 0

Example 4:

to_be_found = 'school'
Output4: 3

Upvotes: 0

Belhadjer Samir
Belhadjer Samir

Reputation: 1659

you try this :

text = 'he is going to school. abc is going to school. xyz is going to school.'
to_be_found = 'going to school'
i=0
r=0
while True :
  if text.find(to_be_found,i) <0 or i>len(text) :
    break
  elif text.find(to_be_found,i) >= 0 :
     r=r+1
     i=text.find(to_be_found,i)+len(to_be_found)


print(r)

Upvotes: 0

jeffry_bo
jeffry_bo

Reputation: 21

  1. the best native way to search substring is still count. it can be used with multi-word substrings as you need

    text = 'he is going to school. abc is going to school. xyz is going to school.'
    text.count('going to school') # 3
    text.count('going to') # 3
    text.count('school') # 3
    text.count('go') # 3
    

    for case 'go' if you need 0 you can search 'go ',' go' or ' go ' to catch separate word

  2. also you can write your own method to search by characters https://stackoverflow.com/a/30863956/15080484

Upvotes: 0

Related Questions