Arash Howaida
Arash Howaida

Reputation: 2617

python - replace string within larger corpus

I'm looking for a native python solution that would allow me to replace phrases wherever they appear within a list of strings. Basically, this looks like:

text_array = ['the store has a piano','dulcimer players are popular with the ladies','guitar','rock legends dont shy away from this gibson model or this PRS electric','guitar','fender guitar','PRS electric',...]

And I'm aiming to locate phrases (exactly) in text_array and replace them with the string logic I have mapped out in a dict that I'm calling thesaurus:

thesaurus = {'gibson model':'guitar', 'fender guitar':'guitar', 'PRS electric':'guitar'}

Question

How would I iterate over each element of text_array and replace all occurrences, wherever they appear, of phrases flagged in thesaurus? (Note: I just want to replace exact matches and leave the rest of the string in-tact).

Desired output:

text_array = ['the store has a piano','dulcimer players are popular with the ladies','guitar','rock legends dont shy away from this guitar or this guitar', 'guitar','guitar','guitar']

Upvotes: 0

Views: 120

Answers (6)

Devang Sanghani
Devang Sanghani

Reputation: 780

Here's mine :

text_array = ['the store has a piano','dulcimer players are popular with the ladies','guitar','rock legends dont shy away from this gibson model or this PRS electric','guitar','fender guitar','PRS electric',]
thesaurus = {'gibson model':'guitar', 'fender guitar':'guitar', 'PRS electric':'guitar'}


for i in range(len(text_array)):
    for x,y in thesaurus.items():
            text_array[i] = text_array[i].replace(x,y)
            
                

print(text_array)

Output:

['the store has a piano', 'dulcimer players are popular with the ladies', 'guitar', 'rock legends dont shy away from this guitar or this guitar', 'guitar', 'guitar', 'guitar']

Upvotes: 1

user7864386
user7864386

Reputation:

Presumably, there's a single match, so we could use a generator expression inside next to search for a match in the "thesaurus":

If you want to change the original list:

for i, text in enumerate(text_array):
    m = next(((k,v) for k,v in thesaurus.items() if k in text), None)
    if m:
        text_array[i] = text.replace(m[0], m[1])

If you want to create a new list:

for i, text in enumerate(text_array):
    m = next(((k,v) for k,v in thesaurus.items() if k in text), None)
    if m:
        text = text.replace(m[0], m[1])
    out.append(text)

You can also use pandas:

import pandas as pd
s = pd.Series(text_array)
msk = s.str.contains('|'.join(thesaurus))
s[msk] = s[msk].replace(thesaurus, regex=True)
out = s.tolist()

Output:

['the store has a piano',
 'dulcimer players are popular with the ladies',
 'guitar',
 'rock legends dont shy away from this guitar',
 'guitar',
 'guitar',
 'guitar']

Upvotes: 1

vbarbosavaz
vbarbosavaz

Reputation: 96

Using regular expressions:

import re

text_array = [
    'the store has a piano',
    'dulcimer players are popular with the ladies',
    'guitar',
    'rock legends dont shy away from this gibson model or this PRS electric',
    'guitar',
    'fender guitar',
    'PRS electric'
]

thesaurus = {
    'gibson model':'guitar',
    'fender guitar':'guitar',
    'PRS electric':'guitar'
}

pattern = re.compile(r'(?<!\w)(' + '|'.join(re.escape(key) for key in thesaurus.keys()) + r')(?!\w)')

for i,sentence in enumerate(text_array):
    
    text_array[i] = pattern.sub(lambda x: thesaurus[x.group()], sentence)

print(text_array)

Output:

['the store has a piano', 'dulcimer players are popular with the ladies', 'guitar', 'rock legends dont shy away from this guitar or this guitar', 'guitar', 'guitar', 'guitar']

Upvotes: 0

Benjamin Ruck
Benjamin Ruck

Reputation: 191

This would be my approach. This one doesn't affect the original text_array.

text_array = ['the store has a piano','dulcimer players are popular with the ladies','guitar','rock legends dont shy away from this gibson model or this PRS electric','guitar','fender guitar','PRS electric']
thesaurus = {'gibson model':'guitar', 'fender guitar':'guitar', 'PRS electric':'guitar'}

res = []
for text in text_array:
    for key in thesaurus:
        text = text.replace(key, thesaurus[key])
    res.append(text)
print(res)

Upvotes: 2

Manish Shetty
Manish Shetty

Reputation: 144

Use this code

text_array = ['the store has a piano','dulcimer players are popular with the ladies','guitar','rock legends dont shy away from this gibson model or this PRS electric','guitar','fender guitar','PRS electric']
thesaurus = {'gibson model':'guitar', 'fender guitar':'guitar', 'PRS electric':'guitar'}
for key in thesaurus.keys():
    for i,item in enumerate(text_array):
        text_array[i]=item.replace(key,thesaurus[key])
print(text_array)

Result :

['the store has a piano', 'dulcimer players are popular with the ladies', 'guitar', 'rock legends dont shy away from this guitar or this guitar', 'guitar', 'guitar', 'guitar']

Upvotes: 1

DilLip_Chowdary
DilLip_Chowdary

Reputation: 1191

You can use the below code snippet, to get the expected output:

text_array = ['the store has a piano','dulcimer players are popular with the ladies','guitar','rock legends dont shy away from this gibson model or this PRS electric','guitar','fender guitar','PRS electric',...]

thesaurus = {'gibson model':'guitar', 'fender guitar':'guitar', 'PRS electric':'guitar'}


for index, val in enumerate(text_array):
    # Checking if key exist in list item

    for key in list(thesaurus.keys()):
        if key in val:
            # Updating List item value
            text_array[index] = text_array[index].replace(key, thesaurus[key])

Upvotes: 1

Related Questions