Reputation: 55
I'm writing a function to implement the solution to finding the number of times a word occurs in a list of elements, retrieved from a text file which is pretty straightforward to achieve.
However, I have been at it for two days trying to figure out how to check occurrences of a string which contains multiple words, can be two or more
So for example say the string is:
"hello bye"
and the list is:
["car", "hello","bye" ,"hello"]
The function should return the value 1
because the elements "hello" and "bye" only occur once consecutively.
The closest I've gotten to the solution is using
words[0:2] = [' '.join(words[0:2])]
which would join two elements together given the index. This however is wrong as the input given will be the element itself rather than an index.
Can someone point me to the right direction?
Upvotes: 1
Views: 931
Reputation: 21643
Two possibilities.
## laboriously
lookFor = 'hello bye'
words = ["car", "hello","bye" ,"hello", 'tax', 'hello', 'horn', 'hello', 'bye']
strungOutWords = ' '.join(words)
count = 0
p = 0
while True:
q = strungOutWords [p:].find(lookFor)
if q == -1:
break
else:
p = p + q + 1
count += 1
print (count)
## using a regex
import re
print (len(re.compile(lookFor).findall(strungOutWords)))
Upvotes: 1
Reputation: 1935
I would suggest reducing the problem into counting occurrences of a string within another string.
words = ["hello", "bye", "hello", "car", "hello ", "bye me", "hello", "carpet", "shoplifter"]
sentence = "hello bye"
my_text = " %s " % " ".join([item for sublist in [x.split() for x in words] for item in sublist])
def count(sentence):
my_sentence = " %s " % " ".join(sentence.split())
return my_text.count(my_sentence)
print count("hello bye")
>>> 2
print count("pet shop")
>>> 0
Upvotes: 0
Reputation: 133849
Let's split this problem in two parts. First, we establish a function that will return ngrams of a given list, that is sublists of n consecutive elements:
def ngrams(l, n):
return list(zip(*[l[i:] for i in range(n)]))
We can now get 2, 3 or 4-grams easily:
>>> ngrams(["car", "hello","bye" ,"hello"], 2)
[('car', 'hello'), ('hello', 'bye'), ('bye', 'hello')]
>>> ngrams(["car", "hello","bye" ,"hello"], 3)
[('car', 'hello', 'bye'), ('hello', 'bye', 'hello')]
>>> ngrams(["car", "hello","bye" ,"hello"], 4)
[('car', 'hello', 'bye', 'hello')]
Each item is made into a tuple.
Now make the phrase 'hello bye'
into a tuple:
>>> as_tuple = tuple('hello bye'.split())
>>> as_tuple
('hello', 'bye')
>>> len(as_tuple)
2
Since this has 2 words, we need to generate bigrams from the sentence, and count the number of matching bigrams. We can generalize all this to
def ngrams(l, n):
return list(zip(*[l[i:] for i in range(n)]))
def count_occurrences(sentence, phrase):
phrase_as_tuple = tuple(phrase.split())
sentence_ngrams = ngrams(sentence, len(phrase_as_tuple))
return sentence_ngrams.count(phrase_as_tuple)
print(count_occurrences(["car", "hello","bye" ,"hello"], 'hello bye'))
# prints 1
Upvotes: 1
Reputation: 48057
Match the string with the join of the consecutive elements in the main list. Below is the sample code:
my_list = ["car", "hello","bye" ,"hello"]
sentence = "hello bye"
word_count = len(sentence.split())
c = 0
for i in range(len(my_list) - word_count + 1):
if sentence == ' '.join(my_list[i:i+word_count]):
c+=1
Final value hold by c
will be:
>>> c
1
If you are looking for a one-liner, you may use zip
and sum
as:
>>> my_list = ["car", "hello","bye" ,"hello"]
>>> sentence = "hello bye"
>>> words = sentence.split()
>>> sum(1 for i in zip(*[my_list[j:] for j in range(len(words))]) if list(i) == words)
1
Upvotes: 1