fooquency
fooquency

Reputation: 1654

Regular expression for match when in the middle of a sentence only

I'm trying to create a regexp to match the standalone letters I and V only when in the middle of a sentence.

I'm using preg_match_all, as there could be multiple matches.

I am able to create multiple regexps if needed, i.e. if clearer it is fine to split things out rather than have a complex single regexp.

The string will never contain line-breaks - it's never a multiline.

Examples:

Materialy I regionalʹnoĭ would create a match, and capture I.

Materialy V regionalʹnoĭ would create a match, and capture V.

V strane lʹdov - would not create a match, because the V at the start of the sentence string.

Materialy. V dvukh tomakh would not create a match, because the V is at the start of a sentence, i.e. after a dot-space.

John i Vladimir would not create a match for the V, because the V is not standalone.

John i Vladimir would not create a match for the i, because the i is lower-case.

V strane lʹdov - Materialy I regionalʹnoĭ would capture only the I, because only that is in the middle of a sentence.

I've been trying various combinations of ^ (?: (?! [] and so on, but can't get this to work.

Upvotes: 1

Views: 1404

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626747

You may use

'~[?!.]\s*[VI](*SKIP)(*F)|(?<=\s)[VI](?=\s)~'

See the regex demo

If the number of spaces is normalized in the sentences, you may just use

'~(?<=\s)(?<![?!.]\s)[VI](?=\s)~'

See this demo

NOTE: If you need to make it work on a multiple line text, then it may be safer to replace all \s with \h, to only match horizontal whitespaces.

Details

  • [?!.]\s*[VI] - matches ?, ! or ., then 0 or more whitespaces and then a V or I and
  • (*SKIP)(*F) - since we know these are not welcome skip this match and go on searching
  • | - or
  • (?<=\s)[VI](?=\s) - match V or I when surrounded with whitespaces.

Upvotes: 2

N69S
N69S

Reputation: 17206

Here is a simple regex that satisfies your use cases.

preg_match('/.*[^\.]\h([VI])\h.*/', 'V strane lʹdov - Materialy I regionalʹnoĭ');

Upvotes: 0

Related Questions