user2955219
user2955219

Reputation: 11

How to extract text between words in another file

I am trying to pull a certain segment of information from a text file, and write it to another file. The below are firewall logs; and the only important information to me is the IP address and port after "inside/" and the IP address and port after "outside/"

May 24 10:21:53 10.110.9.18 v3306 %FWSM-4-106100: access-list inside permitted tcp inside/10.110.27.5(53264) -> outside/172.23.240.2(1984) hit-cnt 1 (1-second interval) [0xee13216c, 0x0] 

May 24 10:21:53 10.110.9.18 v3306 %FWSM-4-106100: access-list inside permitted tcp inside/10.110.27.5(53265) -> outside/10.110.2.5(1984) hit-cnt 1 (1-second interval) [0xee13216c, 0x0] 

I would basically want the output to end up looking like below:

10.110.27.5(53264) -> 172.23.240.2(1984)

It would be nice if there was a way to remove duplicates as well.

Upvotes: 1

Views: 143

Answers (2)

mpapec
mpapec

Reputation: 50637

perl -nE'@r= /(?:inside|outside)\/(\S+)/g and say join" -> ", @r' file

without duplicates:

perl -nE'@r= /(?:inside|outside)\/(\S+)/g and !$s{"@r"}++ and say join" -> ", @r' file

or

perl -nE'
  @r= /(?:inside|outside)\/(\S+)/g;
  if (@r and !$s{"@r"}++) { say join" -> ", @r }
' file

Upvotes: 4

Joe Z
Joe Z

Reputation: 17936

I will assume that both the inside and outside are on the same line. You should be able to scan through a file with a loop like this, finding matches:

open my $fh, "<", $logfile or die "can't open $logfile for reading\n";

my %seen;  # used for filtering dupes.

while (<$fh>)
{
    my $line = $_;

    if ($line =~ /inside\/([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+\([0-9]+\)).*outside\/([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+\([0-9]+\))/)
    {
        my $hit = "$1 -> $2";
        print $hit, "\n" if (++$seen{$hit} == 1);
    }
}
close $fh;

I think that should work.

It's entirely possible that regex above is overspecific. The following code has one that's a bit more relaxed:

open my $fh, "<", $logfile or die "can't open $logfile for reading\n";

my %seen;  # used for filtering dupes.

while (<$fh>)
{
    my $line = $_;

    if ($line =~ /(inside.*outside[^)]*\))/)
    {
        my $hit = $1;
        $hit =~ s/(inside|outside)\///g;  # remove 'inside/' and 'outside/' from string.
        print $hit, "\n" if (++$seen{$hit} == 1);
    }
}
close $fh;

Upvotes: 2

Related Questions