Fatima Mustafa
Fatima Mustafa

Reputation: 55

Python: how to find out the occurrences of a sentence in a list

I'm writing a function to implement the solution to finding the number of times a word occurs in a list of elements, retrieved from a text file which is pretty straightforward to achieve.

However, I have been at it for two days trying to figure out how to check occurrences of a string which contains multiple words, can be two or more

So for example say the string is:

"hello bye"

and the list is:

["car", "hello","bye" ,"hello"]

The function should return the value 1 because the elements "hello" and "bye" only occur once consecutively.


The closest I've gotten to the solution is using

words[0:2] = [' '.join(words[0:2])]

which would join two elements together given the index. This however is wrong as the input given will be the element itself rather than an index.

Can someone point me to the right direction?

Upvotes: 1

Views: 931

Answers (4)

Bill Bell
Bill Bell

Reputation: 21643

Two possibilities.

## laboriously

lookFor = 'hello bye'
words = ["car", "hello","bye" ,"hello", 'tax', 'hello', 'horn', 'hello', 'bye']

strungOutWords = ' '.join(words)

count = 0
p = 0
while True:
    q = strungOutWords [p:].find(lookFor)
    if q == -1:
        break
    else:
        p = p + q + 1
        count += 1

print (count)

## using a regex

import re
print (len(re.compile(lookFor).findall(strungOutWords)))

Upvotes: 1

kardaj
kardaj

Reputation: 1935

I would suggest reducing the problem into counting occurrences of a string within another string.

words = ["hello", "bye", "hello", "car", "hello ", "bye me", "hello", "carpet", "shoplifter"]
sentence = "hello bye"
my_text = " %s " % " ".join([item for sublist in [x.split() for x in words] for item in sublist])


def count(sentence):
    my_sentence = " %s " % " ".join(sentence.split())
    return my_text.count(my_sentence)


print count("hello bye")
>>> 2
print count("pet shop")
>>> 0

Upvotes: 0

Let's split this problem in two parts. First, we establish a function that will return ngrams of a given list, that is sublists of n consecutive elements:

def ngrams(l, n):
    return list(zip(*[l[i:] for i in range(n)]))

We can now get 2, 3 or 4-grams easily:

>>> ngrams(["car", "hello","bye" ,"hello"], 2)
[('car', 'hello'), ('hello', 'bye'), ('bye', 'hello')]
>>> ngrams(["car", "hello","bye" ,"hello"], 3)
[('car', 'hello', 'bye'), ('hello', 'bye', 'hello')]
>>> ngrams(["car", "hello","bye" ,"hello"], 4)
[('car', 'hello', 'bye', 'hello')]

Each item is made into a tuple.

Now make the phrase 'hello bye' into a tuple:

>>> as_tuple = tuple('hello bye'.split())
>>> as_tuple
('hello', 'bye')
>>> len(as_tuple)
2

Since this has 2 words, we need to generate bigrams from the sentence, and count the number of matching bigrams. We can generalize all this to

def ngrams(l, n):
    return list(zip(*[l[i:] for i in range(n)]))

def count_occurrences(sentence, phrase):
    phrase_as_tuple = tuple(phrase.split())
    sentence_ngrams = ngrams(sentence, len(phrase_as_tuple))
    return sentence_ngrams.count(phrase_as_tuple)

print(count_occurrences(["car", "hello","bye" ,"hello"], 'hello bye'))
# prints 1

Upvotes: 1

Moinuddin Quadri
Moinuddin Quadri

Reputation: 48057

Match the string with the join of the consecutive elements in the main list. Below is the sample code:

my_list = ["car", "hello","bye" ,"hello"]
sentence = "hello bye"
word_count = len(sentence.split())
c = 0

for i in range(len(my_list) - word_count + 1):
    if sentence == ' '.join(my_list[i:i+word_count]):
        c+=1

Final value hold by c will be:

>>> c
1

If you are looking for a one-liner, you may use zip and sum as:

>>> my_list = ["car", "hello","bye" ,"hello"]
>>> sentence = "hello bye"
>>> words = sentence.split()

>>> sum(1 for i in zip(*[my_list[j:] for j in range(len(words))]) if list(i) == words)
1

Upvotes: 1

Related Questions