Reputation: 1654
I'm trying to create a regexp to match the standalone letters I and V only when in the middle of a sentence.
I'm using preg_match_all, as there could be multiple matches.
I am able to create multiple regexps if needed, i.e. if clearer it is fine to split things out rather than have a complex single regexp.
The string will never contain line-breaks - it's never a multiline.
Examples:
Materialy I regionalʹnoĭ
would create a match, and capture I
.
Materialy V regionalʹnoĭ
would create a match, and capture V
.
V strane lʹdov
- would not create a match, because the V at the start of the sentence string.
Materialy. V dvukh tomakh
would not create a match, because the V is at the start of a sentence, i.e. after a dot-space.
John i Vladimir
would not create a match for the V, because the V is not standalone.
John i Vladimir
would not create a match for the i, because the i is lower-case.
V strane lʹdov - Materialy I regionalʹnoĭ
would capture only the I
, because only that is in the middle of a sentence.
I've been trying various combinations of ^ (?: (?! [] and so on, but can't get this to work.
Upvotes: 1
Views: 1404
Reputation: 626747
You may use
'~[?!.]\s*[VI](*SKIP)(*F)|(?<=\s)[VI](?=\s)~'
See the regex demo
If the number of spaces is normalized in the sentences, you may just use
'~(?<=\s)(?<![?!.]\s)[VI](?=\s)~'
See this demo
NOTE: If you need to make it work on a multiple line text, then it may be safer to replace all \s
with \h
, to only match horizontal whitespaces.
Details
[?!.]\s*[VI]
- matches ?
, !
or .
, then 0 or more whitespaces and then a V
or I
and(*SKIP)(*F)
- since we know these are not welcome skip this match and go on searching|
- or (?<=\s)[VI](?=\s)
- match V
or I
when surrounded with whitespaces.Upvotes: 2
Reputation: 17206
Here is a simple regex that satisfies your use cases.
preg_match('/.*[^\.]\h([VI])\h.*/', 'V strane lʹdov - Materialy I regionalʹnoĭ');
Upvotes: 0