Reputation: 55
Before I begin let me say I am new to regex, but today I have done extensive research and cannot find a solution to the following problem.
EDIT: I want to return just the numbers in all examples. But I want the punctuation excluded.
A single character string will not be returned if you surround it with punctuation and then choose not to return the punctuation.
Here's a basic example of this problem.
[^<].*[^>] on <12> returns 12
[^<].*[^>] on <1> returns nothing
If you only have punctuation on one side you are not returning then it works fine.
[^<].* on <1 returns 1
.*[^>] on 1> returns 1
[^<].*[^>] on <1> returns nothing
Here are regex's I have tried and their results.
[^<].*[^>] on <1> returns nothing
[^<][.]*[^>] on <1> returns nothing
[^<]+[^>] on <1> returns nothing
[^<][^\r\n]*[^>] on <1> returns nothing
[^<]\w*[^>] on <1> returns nothing
[^<]\d*[^>] on <1> returns nothing
[^<].?[^>] on <1> returns nothing
[^<][0-9]?[^>] on <1> returns nothing
[^<].*?[^>] on <1> returns nothing
Any help would be greatly appreciated.
Upvotes: 1
Views: 116
Reputation: 3888
Although your regular expression works sometimes but it's wrong. let me first explain:
[^<]
means any character that's not a less than sign <
. the ^
means opposite when put in a character class ie between brackets []
..*
matches any character zero or more times.
let's look at the how your regexes work:[^<].*[^>]
with <12>
:
[^<]
can't match <
thus it matches 1
.*
matches 2
[^>]
can't match >
thus the regular expression engine backtracks to 2
, now .*
matches nothing.[^<].*[^>]
with <1>
:
[^<]
can't match <
thus it matches 1
..*
matches the >
.[^>]
now the regular expression engine backtracks cuz to have a match it needs to match any character that's not <
and it has already reached the end of the string. now .*
matches nothing and the next character is >
that's why the match fails.What you meant to do is ^<(.*?)>
, where:
^
beginning of the string (you could omit this if you want to match any part of the string)<
match a less than sign..*
match zero or more occurences of any characters. if you want to be more specific you could use and you'll only match digits \d
or [0-9]
in place of the period.>
matches a greater than sign.the parentheses means capture these characters and are called a capture group in the regex jargon.
Another way to go about this is using lookaheads (?=)
and lookbehinds (?<=)
these are non capturing groups which would assert if the following (resp. preceding) characters validate the pattern given.
The regex would become (?<=<).*(?=>)
which means match any character that's between <>
Upvotes: 2
Reputation: 156
The [^<]
(any charachter not a "<") matches the 1 of 12, then .*
matches nothing and [^>]]
(any charachter not a ">") is matching the 2.
If you are looking to extract the digits between the < and >, your regex would look like <(.*)>
that matches the whole set, but the parenthesis around the .*
should be reported as a matched subgroup. Depending on the language you are using, you would need to use the library available to extract the subgroup match.
Upvotes: 0