Reputation: 12230
I'm trying to scrap apache status page for IP address listed, an example apache status page.
<tr><td><b>0-35</b></td><td>1791</td><td>1/1079/387615</td><td>G
</td><td>5541.08</td><td>379</td><td>557</td><td>135.0</td><td>33.04</td><td>20992.04
</td><td>83.60.245.1</td><td nowrap></td><td nowrap></td></tr>
I have downloaded the page
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
use feature 'say';
use File::Slurp;
my $content = get('http://www.apache.org/server-status') or die 'Unable to get page';
write_file('filename',$content);
How can I create an array of of IP address found?
Thanks
Upvotes: 0
Views: 1639
Reputation: 22461
Just find all entries of groups of 1-3 digits separated by dot and verify that each of then is in 0-255 range.
while ($content =~ /(?<!\d)(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})(?!\d)/g) {
if (
$1 >= 0 && $1 <= 255 &&
$2 >= 0 && $2 <= 255 &&
$3 >= 0 && $3 <= 255 &&
$4 >= 0 && $4 <= 255
) {
print "$1.$2.$3.$4\n";
}
}
Upvotes: 4
Reputation: 938
I'd use the RegExp::Common
(Documentation) module available on the CPAN, like so:
use Regexp::Common qw /net/;
while ($content =~ m!<td>($RE{net}{IPv4})</td>!g) {
print "IP: $1\n";
}
Upvotes: 5
Reputation: 465
\d{1,3}(\.\d{1,3}){3}
This is not enough to exclusively match all ipv4 address, but should be enough for your page.
For example, it would also match 522.53.0.0
.
Demo: http://regexr.com/3dc1q
Upvotes: 3