Lukasz
Lukasz

Reputation: 2606

Replacing entire string if there is a matching word

I've got a large list of words that I'm trying to cleanup. A number of those words appear multiple times written a little differently every time and I would like to normalize them. For instance I would like to replace the following words:

list = ["resident super", "super live in", "on site superintendent in building", "livein super", "residential super", "superintendent lives in", "on-site super"...]

with just superintendent

I figured I could do this with

for item in list:
    re.sub("resident super|super live in|on site superintendent in building| livein super|residential super|superintendent lives in|on-site super", 
           "superintendent", list)

but I'm certain to miss some entries. All the entries include the word super, but is there a way to have a regex rule that will replace the entire item with the desired word?

Upvotes: 0

Views: 6782

Answers (4)

Jan
Jan

Reputation: 43169

Shortest with a list comprehension:

lst = ["resident super", "super live in", "on site superintendent in building", 
"livein super", "residential super", "superintendent lives in", "on-site super", "mega-intendent"]

new_lst = ['superintendent' if 'super' in item else item 
            for item in lst]

print(new_lst)
# ['superintendent', 'superintendent', 'superintendent', 'superintendent', 'superintendent', 'superintendent', 
# 'superintendent', 'mega-intendent']

Upvotes: 1

arshbot
arshbot

Reputation: 15815

I might be misunderstanding your question, but couldn't you use in instead? This doesn't seem to warrant regex as regex is significantly slower.

For instance:

i=0
while i < len(list):
    if 'super' in list[i]:
        list[i] = 'superintendant'
    i+=1

This will replace everything in your list that contains super with superintendent

Upvotes: 1

staad
staad

Reputation: 836

I'm not sure if I understand your question but if you want to replace every element with the word super in it by superintendant, here's what I would do.

for index,element in enumerate(listToCheck):
    if "super" in element:
        listToCheck[index]="superintendant"

By the way don't name your variables list because it's a reserved python keyword.

Upvotes: 1

Blckknght
Blckknght

Reputation: 104712

The re.sub method doesn't replace a string in place. It can't, since strings in Python are immutable. When you do a substitution on a string, it returns a new string with the requested changes (or the original string if there was no match). You're currently ignoring the return value, so your code has no effect.

But I don't think you actually need regular expressions at all for this issue. If you want to replace any string that mentions the word super anywhere with the string "superintendent", you can use a simple substring test:

for i, item in enumerate(list_of_strings):
    if "super" in item:
        list_of_strings[i] = "superintendent"

This will of course be more prone to false-positives than using your current regex. You could still use the structure of the code above with a regex search if you want (just change the if "super" in item: line to if re.search(pattern, item): after setting pattern to a regex that matches the strings you want it to).

Upvotes: 1

Related Questions