Need a Regex that adds a space after a period, but can account for abbreviations such as U.S. or D.C

Question

Here is what I have so far:

text = re.sub((?<=\.)(?=[A-Z]), text)

This already avoids numbers and it gets around non-capital letters, but I need it to account for the edge case where initials are separated by periods.

An example sentence where I wouldn't want to add a space would be:

The U.S. health care is more expensive than U.K health care.

Currently, my regex makes it like:

The U. S. health care is more expensive than U. K health care.

But I want it to look exactly like the first sentence without the spaces separating U.S and U.K

I'm not sure how to do this, any advice would be appreciated!

EDIT:

(?<=\.)(?=[A-Z][a-z]{1,})

makes it so that it avoids one word abbreviations.

Need a Regex that adds a space after a period, but can account for abbreviations such as U.S. or D.C

Answers (1)

Related Questions