Reputation: 25
I would like to use egrep/grep -E to print out the lines in C source files that contain integer literals (as described here). The following works for the most part, except it matches floats too:
egrep '\b[0-9]+' *.c
Any suggestions for how to fix this?
Upvotes: 0
Views: 1613
Reputation: 18980
I would not try to overoptimize a pattern like this and just convert each integer literal type and the possible suffixes literally into a regex with alternations:
(?i)(?:0x(?:[0-9a-f]+(?:'?[0-9a-f]+)*)|0b(?:[10]+(?:'?[10]+)*)|\d+(?:'?\d+)*)(?:ull|ll|ul|l|u)?
Only the digit separators require some more work: a separator cannot be followed by another separator and can only appear between numbers.
Suffixes are allowed for hex and binary too, as tested with C++14 here.
Note: The pattern is designed to be case-insensitive.
Run it like this: egrep -ei "(?:0x(?:[0-9a-f]+(?:'?[0-9a-f]+)*)|0b(?:[10]+(?:'?[10]+)*)|\d+(?:'?\d+)*)(?:ull|ll|ul|l|u)?" input.txt
PS: If you just want to extract the values a Perl script could come handy:
use strict;
my $file = '/some/where/input.txt';
my $regex = qr/(?:0x(?:[0-9a-f]+(?:'?[0-9a-f]+)*)|0b(?:[10]+(?:'?[10]+)*)|\d+(?:'?\d+)*)(?:ull|ll|ul|l|u)?/ip;
open my $input, '<', $file or die "can't open $file: $!";
while (<$input>) {
chomp;
while ($_ =~ /($regex)/g) {
print "${^MATCH}\n";
}
}
close $input or die "can't close $file: $!";
Upvotes: 0
Reputation: 19651
You can use negative Lookarounds to make sure the number isn't followed by or preceded by a .
:
\b(?<!\.)[0-9]+(?!\.)\b
Edit:
Since you want to only match the 0
of 0x
in hex literals as you mentioned in the comments, use the following pattern instead. It works exactly like your original regex except that it doesn't match float numbers.
\b(?<!\.)[0-9]+(?![\.\d])
Upvotes: 1