Lucas Mengual
Lucas Mengual

Reputation: 415

Find from a list of strings, from a list of strings

I need help looping through a list of sentences/strings, and erase the string characters forwards, based on another list with words.

sentences = ['im not george smith my name is lucas mangulu thank you',
             'how shall i call you george smith oh okay got it'
             'we have detected a miyagi chung in the traffic flow']

words = ['lucas mangulu', 'george smith', 'miyagi chung']

I know I have to loop for each element in the sentences list. But then I'm stuck on how to find() for example in the same element in the words list into the sentences list. So that the final results should be:

sentences = ['im not george smith my name is',
             'how shall i call you'
             'we have detected a']

#OR

sentences = ['im not george smith my name is lucas mangulu',
             'how shall i call you george smith'
             'we have detected a miyagi chung']

Upvotes: 0

Views: 41

Answers (2)

You give two example outputs for one input, which is extremely confusing. The following code may help you but I can't logically figure out how to match your example exactly.

That being said I have a hunch this is what you are looking for.

import re
sentences = ['im not george smith my name is lucas mangulu thank you',
             'how shall i call you george smith oh okay got it',
             'we have detected a miyagi chung in the traffic flow',
             'Is this valid?']

words = ['lucas mangulu', 'george smith', 'miyagi chung', 'test']
ocurrences = []
for sentence in sentences:
    # If you want to find all occurences in a sentence this line will help you
    # ocurrences.append([(x.start(), x.end(), x.group()) for x in re.finditer('|'.join(words), sentence)])

    # Look for a word in this sentence (the first occurrence of that word)
    search_result = re.search('|'.join(words), sentence)
    # If we found a word in this sentence
    if search_result:
        ocurrences.append((search_result.start(), search_result.end(), search_result.group()))
    else: # No word found
        ocurrences.append((0, 0, None))

# Example output 1:
# oc in this case is (start_index, end_index, word_found) for each sentence.
for index, oc in enumerate(ocurrences):
  print(sentences[index][:oc[1]])

# Example output 2"
for index, oc in enumerate(ocurrences):
  print(sentences[index][:oc[0]])

Example output 1:

im not george smith
how shall i call you george smith
we have detected a miyagi chung

Example output 2:

im not
how shall i call you
we have detected a

Upvotes: 0

Ralf
Ralf

Reputation: 16515

I have dificulties understanding what you are looking for exactly, but here is a simple idea to remove the string in words from the strings in sentences; this is using a many calls to str.replace().

>>> words = ['lucas mangulu', 'george smith', 'miyagi chung']
>>> original_sentences = [
...     'im not george smith my name is lucas mangulu thank you',
...     'how shall i call you george smith oh okay got it',
...     'we have detected a miyagi chung in the traffic flow',
... ]
>>> original_sentences
['im not george smith my name is lucas mangulu thank you',
 'how shall i call you george smith oh okay got it',
 'we have detected a miyagi chung in the traffic flow']

>>> sentences = list(original_sentences)                  # make a copy
>>> for i in range(len(sentences)):
...     for w in words:                                   # remove words
...         sentences[i] = sentences[i].replace(w, '')
...     while '  ' in sentences[i]:                       # remove double whitespaces
...         sentences[i] = sentences[i].replace('  ', ' ')
>>> sentences
['im not my name is thank you',
 'how shall i call you oh okay got it',
 'we have detected a in the traffic flow']

Is this what you intended to do?


If you only want to replace one word in all the sentences, you could remove the nested for loop:

>>> sentences = list(original_sentences)                  # make a copy
>>> word_to_remove = words[0]                             # pick one
>>> for i in range(len(sentences)):
...     sentences[i] = sentences[i].replace(word_to_remove, '')
>>> sentences
['im not george smith my name is  thank you',
 'how shall i call you george smith oh okay got it',
 'we have detected a miyagi chung in the traffic flow']

Upvotes: 1

Related Questions