extratermi
extratermi

Reputation: 13

Regex: how to match anything before and after uppercase sequences with a period as delimiter?

I have a series of sentences containing uppercase keywords in a large text containing several other sentences. I just need to match those sentences that contain uppercase words (1 or more), for instance:

This is MY SENTENCE that should be matched.
And THIS one should be too.
This other sentence should not be matched.

Any suggestion? Thanks! I am not an advanced user...

Upvotes: 1

Views: 134

Answers (3)

Jana
Jana

Reputation: 5704

using Python

import re

txt = 'This is MY SENTENCE and I would like, this sentence, to be matched because it contains uppercase words. This other sentence should not be matched. And THIS one should be.' 

for s in txt.split('.'):
    if re.search(r'\b[A-Z]+\b', s): 
        print(s)

output:

This is MY SENTENCE and I would like, this sentence, to be matched because it contains uppercase words
 And THIS one should be

Upvotes: 0

Duc Filan
Duc Filan

Reputation: 7157

This is it:

^.*\b[A-Z]+\b.*$
  • \b assert position at a word boundary
  • A-Z a single character in the range between A (index 65) and Z

https://regex101.com/r/kUN41W/1


If I is NOT counted as an UPPERCASE word in a sentence that matches your conditions. Then use this:

^.*\b[A-Z]{2,}\b.*$
  • {2,} Quantifier — Matches between 2 and unlimited times, as many times as possible, giving back as needed

Upvotes: 1

Martin B.
Martin B.

Reputation: 1663

Try some tools like https://regexr.com/. They really help visualizing which effect your regex has.

For your testdata this regex is fine:

([^\.]*[A-Z]{2,}[^\.]*)\.

It is composed of

  • [^\.]* anything that is no dot
  • [A-Z]{2,} at least 2 uppercase characters
  • [^\.]* anything that is no dot

Upvotes: 1

Related Questions