Reputation:
In my last question I asked for proper way of storing data from text file in my Perl script, the solution was using AoH.
Anyway, my implementation seems to be incomplete:
#!/usr/bin/perl
use strict;
use warnings;
# Open netstat output
my $netstat_dump = "tmp/netstat-output.txt";
open (my $fh, "<", $netstat_dump) or die "Could not open file '$netstat_dump': $!";
# Store data in an hash
my %hash;
while(<$fh>) {
chomp;
my ($Protocol, $RecvQ, $SendQ, $LocalAddress, $ForeignAddress, $State, $PID) = split(/\s+/);
# Exclude $RecvQ and $SendQ
$hash{$PID} = [$Protocol, $LocalAddress, $ForeignAddress, $State $PID];
}
close $fh;
print Dumper \%hash;
First problem is that I get uninitialized value error on $PID
even though $PID
is declared in line above.
Second problem with script is that it loads last letters from input file and puts them in their own rows:
$VAR1 = {
...
'6907/thin' => [
'tcp',
'127.0.0.1:3001',
'0.0.0.0:*',
'LISTEN',
'6907/thin'
],
'' => [
'udp6',
':::49698',
':::*',
'31664/dhclient',
''
],
'r' => [
'udp6',
':::45016',
':::*',
'651/avahi-daemon:',
'r'
]
};
'' =>
and 'r' =>
come from input file which looks like this:
tcp 0 0 0.0.0.0:3790 0.0.0.0:* LISTEN 7550/nginx.conf
tcp 0 0 127.0.1.1:53 0.0.0.0:* LISTEN 1271/dnsmasq
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN 24202/cupsd
tcp 0 0 127.0.0.1:5432 0.0.0.0:* LISTEN 11222/postgres
tcp 0 0 127.0.0.1:3001 0.0.0.0:* LISTEN 6907/thin server (1
tcp 0 0 127.0.0.1:50505 0.0.0.0:* LISTEN 6874/prosvc
tcp 0 0 127.0.0.1:7337 0.0.0.0:* LISTEN 6823/postgres.bin
tcp6 0 0 ::1:631 :::* LISTEN 24202/cupsd
udp 0 0 0.0.0.0:46096 0.0.0.0:* 651/avahi-daemon: r
udp 0 0 0.0.0.0:5353 0.0.0.0:* 651/avahi-daemon: r
udp 0 0 127.0.1.1:53 0.0.0.0:* 1271/dnsmasq
udp 0 0 0.0.0.0:68 0.0.0.0:* 31664/dhclient
udp 0 0 0.0.0.0:631 0.0.0.0:* 912/cups-browsed
udp 0 0 0.0.0.0:37620 0.0.0.0:* 31664/dhclient
udp6 0 0 :::5353 :::* 651/avahi-daemon: r
udp6 0 0 :::45016 :::* 651/avahi-daemon: r
udp6 0 0 :::49698 :::* 31664/dhclient
It also makes me feel that my hash function is not parsing whole file and interrupts somewhere.
Upvotes: 4
Views: 246
Reputation: 6378
You might want to use or look at the source of some related CPAN modules to see how the authors have solved similar problems: e.g. Parse::Netstat
, Regexp::Common
, etc.
Upvotes: 1
Reputation: 29854
Sometimes splitting doesn't work as well as a full specification of the data you are likely to receive. Sometimes you need a regex. Especially because you have a field that may or may not be there. ("LISTEN")
As well, you're also having a hard time separating your PID from your process information.
So here's my regex:
my $netstat_regex
= qr{
\A # The beginning of input
( \w+ ) # the proto
\s+
(?: \d+ \s+ ){2} # we don't care about these
( # Open capture
[[:xdigit:]:.]+?
:
(?: \d+ )
) # Close capture
\s+
( # Open capture
[[:xdigit:]:.]+?
:
(?: \d+ | \* )
) # Close capture
\s+
(?: LISTEN \s+ )? # It might not be a listen socket.
( \d+ ) # Nothing but the PID
/
( .*\S ) # All the other process data (trimmed)
}x;
Then I process it so:
my %records;
while ( <$fh> ) {
my %rec;
@rec{ qw<proto local remote PID data> } = m/$netstat_regex/;
if ( %rec ) {
$records{ $rec{PID} } = \%rec;
}
else {
print "Error processing input line #$.:\n$_\n";
}
}
Note that I also have some code to show me what doesn't fit my pattern, so that I can refine it if necessary. I don't give my full trust to the input.
Nice and tidy dump:
%records: {
11222 => {
PID => '11222',
data => 'postgres',
local => '127.0.0.1:5432',
proto => 'tcp',
remote => '0.0.0.0:*'
},
1271 => {
PID => '1271',
data => 'dnsmasq',
local => '127.0.1.1:53',
proto => 'udp',
remote => '0.0.0.0:*'
},
24202 => {
PID => '24202',
data => 'cupsd',
local => '::1:631',
proto => 'tcp6',
remote => ':::*'
},
31664 => {
PID => '31664',
data => 'dhclient',
local => ':::49698',
proto => 'udp6',
remote => ':::*'
},
651 => {
PID => '651',
data => 'avahi-daemon: r',
local => ':::45016',
proto => 'udp6',
remote => ':::*'
},
6823 => {
PID => '6823',
data => 'postgres.bin',
local => '127.0.0.1:7337',
proto => 'tcp',
remote => '0.0.0.0:*'
},
6874 => {
PID => '6874',
data => 'prosvc',
local => '127.0.0.1:50505',
proto => 'tcp',
remote => '0.0.0.0:*'
},
6907 => {
PID => '6907',
data => 'thin server (1',
local => '127.0.0.1:3001',
proto => 'tcp',
remote => '0.0.0.0:*'
},
7550 => {
PID => '7550',
data => 'nginx.conf',
local => '0.0.0.0:3790',
proto => 'tcp',
remote => '0.0.0.0:*'
},
912 => {
PID => '912',
data => 'cups-browsed',
local => '0.0.0.0:631',
proto => 'udp',
remote => '0.0.0.0:*'
}
}
Upvotes: 4
Reputation: 50637
You can remove state column before split()
so every row have the same number of columns,
# assuming that state is always upper case followed by spaces and digit(s)
$State = s/\b([A-Z]+)(?=\s+\d)// ? $1 : "";
Upvotes: 2
Reputation: 241838
If your input contains tabs, you can split on /\t/
instead. \s+
matches any whitespace, i.e. one tab as well as two tabs, so the "empty columns" are skipped.
Fixing that still doesn't hash all the lines from the input, though. Hash keys must be unique, but the input contains some PIDS more than once (1271/dnsmasq 24202/cupsd 31664/dhclient
2 times and 651/avahi-daemon: r
4 times). You can solve the problem by using HoAoA instead:
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
my $netstat_dump = 'input.txt';
open my $FH, '<', $netstat_dump or die "Could not open file '$netstat_dump': $!";
my %hash;
while (<$FH>) {
chomp;
my ($Protocol, $RecvQ, $SendQ, $LocalAddress, $ForeignAddress, $State, $PID)
= split /\t/;
push @{ $hash{$PID} }, [ $Protocol, $LocalAddress, $ForeignAddress, $State, $PID ];
}
close $FH;
print Dumper \%hash;
Upvotes: 2
Reputation: 6642
When you split a line such as:
udp 0 0 0.0.0.0:37620 0.0.0.0:* 31664/dhclient
on whitespace you get 5 elements, not 6. This is because the state column has no string in it and the PID gets assigned to $State
.
Likewise,
udp 0 0 0.0.0.0:5353 0.0.0.0:* 651/avahi-daemon: r
stores the pid as the 5th element (state) and 'r' as the 6th (pid) due to the space between the colon and r in the PID.
You may want to look into using unpack to split apart fixed width fields. Note that if the input has varying column widths based on content, you will need to determine column widths to use unpack.
Refer to the tutorial for a how-to for this.
Upvotes: 5