kjwill555
kjwill555

Reputation: 25

Regex to match C integer literals

I would like to use egrep/grep -E to print out the lines in C source files that contain integer literals (as described here). The following works for the most part, except it matches floats too:

egrep '\b[0-9]+' *.c

Any suggestions for how to fix this?

Upvotes: 0

Views: 1613

Answers (2)

wp78de
wp78de

Reputation: 18980

I would not try to overoptimize a pattern like this and just convert each integer literal type and the possible suffixes literally into a regex with alternations:

(?i)(?:0x(?:[0-9a-f]+(?:'?[0-9a-f]+)*)|0b(?:[10]+(?:'?[10]+)*)|\d+(?:'?\d+)*)(?:ull|ll|ul|l|u)?

Only the digit separators require some more work: a separator cannot be followed by another separator and can only appear between numbers.

Suffixes are allowed for hex and binary too, as tested with C++14 here.

Demo

Note: The pattern is designed to be case-insensitive.

Run it like this: egrep -ei "(?:0x(?:[0-9a-f]+(?:'?[0-9a-f]+)*)|0b(?:[10]+(?:'?[10]+)*)|\d+(?:'?\d+)*)(?:ull|ll|ul|l|u)?" input.txt

PS: If you just want to extract the values a Perl script could come handy:

use strict;
my $file = '/some/where/input.txt';
my $regex = qr/(?:0x(?:[0-9a-f]+(?:'?[0-9a-f]+)*)|0b(?:[10]+(?:'?[10]+)*)|\d+(?:'?\d+)*)(?:ull|ll|ul|l|u)?/ip;
open my $input, '<', $file or die "can't open $file: $!";
while (<$input>) {
    chomp;
    while ($_ =~ /($regex)/g) {
      print "${^MATCH}\n";
    }
}
close $input or die "can't close $file: $!";

Upvotes: 0

41686d6564
41686d6564

Reputation: 19651

You can use negative Lookarounds to make sure the number isn't followed by or preceded by a .:

\b(?<!\.)[0-9]+(?!\.)\b

Edit:

Since you want to only match the 0 of 0x in hex literals as you mentioned in the comments, use the following pattern instead. It works exactly like your original regex except that it doesn't match float numbers.

\b(?<!\.)[0-9]+(?![\.\d])

Try it online.

References:

Upvotes: 1

Related Questions