Replacing the dots for a list of abbreviations?

Question

I'm trying to remove the dots of a list of abbreviations so that they will not confuse the sentence tokenizer. This is should be very straightforward. Don't know why my code is not working.

Below please find my code:

abbrevs = [
    "No.", "U.S.", "Mses.", "B.S.", "B.A.", "D.C.", "B.Tech.", "Pte.", "Mr.", "O.E.M.",
    "I.R.S", "sq.", "Reg.", "S-K."
]



def replace_abbrev(abbrs, text):
    re_abbrs = [r"\b" + re.escape(a) + r"\b" for a in abbrs]

    abbr_no_dot = [a.replace(".", "") for a in abbrs]

    pattern_zip = zip(re_abbrs, abbr_no_dot)

    for p in pattern_zip:
        text = re.sub(p[0], p[1], text)

    return text

text = "Test No. U.S. Mses. B.S. Test"

text = replace_abbrev(abbrevs, text)

print(text)

Here is the result. Nothing happened. What was wrong? Thanks.

 Test No. U.S. Mses. B.S. Test

vks · Accepted Answer

re_abbrs = [r"\b" + re.escape(a)  for a in abbrs]

You need to use this.There is no \b after . .This gives the correct output.

Test No US Mses BS Test

Replacing the dots for a list of abbreviations?

Answers (2)

Related Questions