Jim
Jim

Reputation: 23

Replace Acronyms with their values Python

I'm working on cleaning some text that contains a lot of acronyms. So I have made a dictionary of a few examples and along with their values, however i am running into a few problems with it. Example code below

    def acr(text):
         acr_dict = {'ft': 'feet'
                      'mi': michigan }
            

         for word, abr in acr_dict.items():
                 text = text.replace(word.lower(), abr)
                 return text

The logic works, but if I have an instance where the letters of the acronym could also be found in certain other words, it will do the following

ex: print(acr('I like milk and live in mi))

output --> I like michiganlk and live in michigan

Any advice on how to not have it look for the acronym letters within other words?

Upvotes: 2

Views: 1007

Answers (2)

jacob
jacob

Reputation: 4957

One potential solution (assuming you have trivial white space) could be to split the string into words, and compare each one and replace if it matches.

example = "my name is michael and i was born in mi and am 6 ft"

def acr(text):
    acr_dict = {
        'ft': 'feet',
        'mi': 'michigan'
    }

    text_words = text.split()
    for i, word in enumerate(text_words):
        if word.lower() in acr_dict:
            text_words[i] = acr_dict[word]
    return ' '.join(text_words)

print(acr(example))
# my name is michael and i was born in michigan and am 6 feet

And if you did have non-trivial white space and were okay using regular expressions, you could do this which should preserve the specific white space character,

import re

def acr(text):
    acr_dict = {
        'ft': 'feet',
        'mi': 'michigan'
    }

    for k, v in acr_dict.items():
        text = re.sub(rf"(\s){k.lower()}(\s|\Z)", rf"\1{v}\2", text)

    return text

If you were worried about performance, you could try compiling each regex for your acronym list before hand.

Upvotes: 1

plentyofcoffee
plentyofcoffee

Reputation: 488

The simplest solution is, as others have stated, to use regexes.

import re

ACR_DICT = {'ft': 'feet', 'mi': 'michigan'}

def acr(text):
    for k, v in ACR_DICT.items():
        text = re.sub(rf'\b{k}\b', v, text)
    return text


acr('I might be 6 ft tall. I often left my home state of mi at 3 years old.')
# 'I might be 6 feet tall. I often left my home state of michigan at 3 years old.'

Note the usage of the word-boundary metacharacter '\b'. This will ensure that the regex doesn't find matches inside words like 'often' or 'might'.

Upvotes: 3

Related Questions