Reputation: 71
I encountered the following problem: If I use the code in the first example the variable $1
includes only the last digit of each string. However, if I use the third example where each "string" is just a number the $1
variable shows the full number with all digits. To me it appears that the \d+
operator works differently in alpha-numeric context and just numeric context.
Here are my questions: Can you reproduce this? Is this behavior intended? How can I capture the full number in the alpha-numeric context using a regex operation in perl? If the nature of the \d
operator is by nature lazy, can I make it more greedy (if true, how would i do it?)?
Example 1:
perl -e 'for ($i = 199; $i < 201; $i ++) { print "words".$i."words\n"}' | perl -ne 'if (/\A\w+(\d+)\w+/) {$num = $1; print $num,"\n";}'
Output:
9
0
Example 2:
perl -e 'for ($i = 199; $i < 201; $i ++) { print "words".$i."words\n"}' | perl -ne 'if (/\A\w+([0-9]+)\w+/) {$num = $1; print $num,"\n";}'
Output:
9
0
Example 3:
perl -e 'for ($i = 199; $i < 201; $i ++) { print "words".$i."words\n"}' | perl -ne 'if (/(\d+)/) {$num = $1; print $num,"\n";}'
Output:
199
200
Thanks in advance. Any help is highly appreciated.
Best, Chris
Upvotes: 3
Views: 1907
Reputation: 490
the problem is that digits are matched by \w.
You should replace "\w" with "\D" ("not digit"). For example :
perl -e 'for ($i = 199; $i < 201; $i ++) { print "words".$i."words\n"}' | perl -ne 'if (/\A\D+(\d+)\D+/) {$num = $1; print $num,"\n";}'
Output:
199
200
Of course, if your data can contain more than one occurrence of digits in a single string, you'll need some more precise regexp.
Upvotes: 1
Reputation: 627110
The results you get are expected. In /\A\w+(\d+)\w+/
, the first \w+
is a greedy pattern and will grab as many chars as it can match, and since \w
also matches digits.
Either use lazy quantifier - /\A\w+?(\d+)\w+/
, or subtract the digit from \w
(e.g. like in /\A[^\W\d]+(\d+)\w+/
). The \w+?
will match 1 or more word chars (letters/digits/_
) as few as possible, and [^\W\d]
matches any letters or _
symbols, thus, no need to use a lazy quantifier with this pattern.
Upvotes: 4