Reputation: 199
I'm trying to use this regex
art\..*[A-Z].*\s
to extract the text in bold here
some text bla art. 100 of Important_text other text bla
Basically, I would like to extract all the text that follow this pattern:
*art.* *number* *whatever* *first word that starts in uppercase*
But it's not working as expected. Any suggestion?
Upvotes: 1
Views: 89
Reputation: 133438
With your shown samples, please try following.
\bart\..*?\d+.*?[A-Z]\w*
Explanation: Adding detailed explanation for above.
\b ##mentioning word boundary here.
art\. ##Looking for word art with a literal dot here.
.*?\d+ ##Using non-greedy approach for matching 1 or more digits.
.*?[A-Z]\w* ##Using non-greedy approach to match 1 capital letter followed by word characters.
Upvotes: 5
Reputation: 163207
You can match art.
then match until the first digits and then match until the first occurrence of an uppercase char.
\bart\.\D*\d+[^A-Z]*[A-Z]\S*
The pattern matches
\bart\.
Match art.
preceded by a word boundary\D*\d+
Match 0+ times a non digit, followed by 1+ digits[^A-Z]*
Match 0+ times any char except A-Z[A-Z]\S*
Match a char A-Z followed by optional non whitespace chars.If the word has to start with A-Z you can assert a whitespace boundary to the left using (?<!\S)
before matching an uppercase char A-Z.
\bart\.\D*\d+[^A-Z]*(?<!\S)[A-Z]\S*
Upvotes: 4