Reputation: 2606
I've got a large list of words that I'm trying to cleanup. A number of those words appear multiple times written a little differently every time and I would like to normalize them. For instance I would like to replace the following words:
list = ["resident super", "super live in", "on site superintendent in building", "livein super", "residential super", "superintendent lives in", "on-site super"...]
with just superintendent
I figured I could do this with
for item in list:
re.sub("resident super|super live in|on site superintendent in building| livein super|residential super|superintendent lives in|on-site super",
"superintendent", list)
but I'm certain to miss some entries. All the entries include the word super
, but is there a way to have a regex rule that will replace the entire item with the desired word?
Upvotes: 0
Views: 6782
Reputation: 43169
Shortest with a list comprehension:
lst = ["resident super", "super live in", "on site superintendent in building",
"livein super", "residential super", "superintendent lives in", "on-site super", "mega-intendent"]
new_lst = ['superintendent' if 'super' in item else item
for item in lst]
print(new_lst)
# ['superintendent', 'superintendent', 'superintendent', 'superintendent', 'superintendent', 'superintendent',
# 'superintendent', 'mega-intendent']
Upvotes: 1
Reputation: 15815
I might be misunderstanding your question, but couldn't you use in
instead? This doesn't seem to warrant regex as regex is significantly slower.
For instance:
i=0
while i < len(list):
if 'super' in list[i]:
list[i] = 'superintendant'
i+=1
This will replace everything in your list that contains super
with superintendent
Upvotes: 1
Reputation: 836
I'm not sure if I understand your question but if you want to replace every element with the word super
in it by superintendant
, here's what I would do.
for index,element in enumerate(listToCheck):
if "super" in element:
listToCheck[index]="superintendant"
By the way don't name your variables list
because it's a reserved python keyword.
Upvotes: 1
Reputation: 104712
The re.sub
method doesn't replace a string in place. It can't, since strings in Python are immutable. When you do a substitution on a string, it returns a new string with the requested changes (or the original string if there was no match). You're currently ignoring the return value, so your code has no effect.
But I don't think you actually need regular expressions at all for this issue. If you want to replace any string that mentions the word super
anywhere with the string "superintendent"
, you can use a simple substring test:
for i, item in enumerate(list_of_strings):
if "super" in item:
list_of_strings[i] = "superintendent"
This will of course be more prone to false-positives than using your current regex. You could still use the structure of the code above with a regex search if you want (just change the if "super" in item:
line to if re.search(pattern, item):
after setting pattern
to a regex that matches the strings you want it to).
Upvotes: 1