Reputation: 1437
I'd like to clean up a subtitle file that has many errors because of OCR. On of the errors is that the l
is displayed as I
. Of course sometimes the I
is really a I
, mainly in the case of:
I'm Ieaving...
or - I'm Ieaving...
.IsabeIIe
.Since names are difficult to detect, I figured it would be best to replace only the I
's with one or more directly preceding lowercase letters and check the rest manually. So after the conversion I get I'm Ieaving
and Isabelle
. This is the most 'barebone' automated solution I can think of since there are not that many words that have a lowercase letter directly preceding an uppercase letter.
How can I do this in Regex? Thanks in advance.
Upvotes: 2
Views: 4760
Reputation: 6741
/([a-z])I/
would capture upper case I's preceded by any lowercase letter a-z.
Upvotes: 0
Reputation:
Either one of these, and if your engine supports modifier groups.
(?-i:(?<=[a-z])I)
or
(?-i:[a-z]I)
For Unicode, you will want to use properties.
Upvotes: 1
Reputation: 44326
If your regex engine supports lookbehind, you can find all I's preceded by a lowercase letter like this:
(?<=[a-z])I
Otherwise, you could match both characters, and the second one will be the I.
[a-z]I
Upvotes: 2