Daan
Daan

Reputation: 1437

Regex: Match only characters with preceding lowercase letter(s)

I'd like to clean up a subtitle file that has many errors because of OCR. On of the errors is that the l is displayed as I. Of course sometimes the I is really a I, mainly in the case of:

Since names are difficult to detect, I figured it would be best to replace only the I's with one or more directly preceding lowercase letters and check the rest manually. So after the conversion I get I'm Ieaving and Isabelle. This is the most 'barebone' automated solution I can think of since there are not that many words that have a lowercase letter directly preceding an uppercase letter.

How can I do this in Regex? Thanks in advance.

Upvotes: 2

Views: 4760

Answers (3)

Ben Taber
Ben Taber

Reputation: 6741

/([a-z])I/ would capture upper case I's preceded by any lowercase letter a-z.

Upvotes: 0

user557597
user557597

Reputation:

Either one of these, and if your engine supports modifier groups.

(?-i:(?<=[a-z])I)
or
(?-i:[a-z]I)

For Unicode, you will want to use properties.

Upvotes: 1

Kendall Frey
Kendall Frey

Reputation: 44326

If your regex engine supports lookbehind, you can find all I's preceded by a lowercase letter like this:

(?<=[a-z])I

Otherwise, you could match both characters, and the second one will be the I.

[a-z]I

Upvotes: 2

Related Questions