albero
albero

Reputation: 199

Missing something in the regex?

I'm trying to use this regex

art\..*[A-Z].*\s

to extract the text in bold here

some text bla art. 100 of Important_text other text bla

Basically, I would like to extract all the text that follow this pattern:

*art.* *number* *whatever* *first word that starts in uppercase*

But it's not working as expected. Any suggestion?

Upvotes: 1

Views: 89

Answers (2)

RavinderSingh13
RavinderSingh13

Reputation: 133438

With your shown samples, please try following.

\bart\..*?\d+.*?[A-Z]\w*

Online demo for above regex

Explanation: Adding detailed explanation for above.

\b           ##mentioning word boundary here.
art\.        ##Looking for word art with a literal dot here.
.*?\d+       ##Using non-greedy approach for matching 1 or more digits.
.*?[A-Z]\w*  ##Using non-greedy approach to match 1 capital letter followed by word characters.

Upvotes: 5

The fourth bird
The fourth bird

Reputation: 163207

You can match art. then match until the first digits and then match until the first occurrence of an uppercase char.

\bart\.\D*\d+[^A-Z]*[A-Z]\S*

The pattern matches

  • \bart\. Match art. preceded by a word boundary
  • \D*\d+ Match 0+ times a non digit, followed by 1+ digits
  • [^A-Z]* Match 0+ times any char except A-Z
  • [A-Z]\S* Match a char A-Z followed by optional non whitespace chars.

Regex demo

If the word has to start with A-Z you can assert a whitespace boundary to the left using (?<!\S) before matching an uppercase char A-Z.

\bart\.\D*\d+[^A-Z]*(?<!\S)[A-Z]\S*

Upvotes: 4

Related Questions