Reputation: 1221
I need to write a Perl regex to match numbers in a word with both letters and numbers.
Example: test123
. I want to write a regex that matches only the number part and capture it
I am trying this \S*(\d+)\S*
and it captures only the 3 but not 123.
Upvotes: 0
Views: 234
Reputation: 29844
Were it a case where a non-digit was required (say before, per your example), you could use the following non-greedy expressions:
/\w+?(\d+)/ or /\S+?(\d+)/
(The second one is more in tune with your \S*
specification.)
Your expression satisfies any condition with one or more digits, and that may be what you want. It could be a string of digits surrounded by spaces (" 123 "
), because the border between the last space and the first digit satisfies zero-or-more non-space, same thing is true about the border between the '3'
and the following space.
Chances are that you don't need any specification and capturing the first digits in the string is enough. But when it's not, it's good to know how to specify expected patterns.
Upvotes: 1
Reputation: 385556
Regex atoms will match as much as they can.
Initially, the first \S*
matched "test123
", but the regex engine had to backtrack to allow \d+
to match. The result is:
+------------------- Matches "test12"
| +-------------- Matches "3"
| | +--------- Matches ""
| | |
--- --- ---
\S* (\d+) \S*
All you need is:
my ($num) = "test123" =~ /(\d+)/;
It'll try to match at position 0, then position 1, ... until it finds a digit, then it will match as many digits it can.
Upvotes: 9
Reputation: 149736
\S
matches any non-whitespace characters, including digits. You want \d+
:
my ($number) = 'test123' =~ /(\d+)/;
Upvotes: 1
Reputation: 1655
"something122320" =~ /(\d+)/
will return 122320; this is probably what you're trying to do ;)
Upvotes: 1
Reputation: 338
The *
in your regex are greedy, that's why they "eat" also numbers. Exactly what @Marc said, you don't need them.
perl -e '$_ = "qwe123qwe"; s/(\d+)/$numbers=$1/e; print $numbers . "\n";'
Upvotes: 1
Reputation: 11613
I think parentheses signify capture groups, which is exactly what you don't want. Remove them. You're looking for /\d+/
or /[0-9]+/
Upvotes: -1