user983043
user983043

Reputation: 111

using perl to extract information from text files

I have to parse multiple log files that look like dmesg output.

Example log file:

....
1399424400 4 abcd 2604 starting job (jobid=1325) for client abc.xyz.com, requesting resources now
 RESOURCE_GRANTED 1399424400 DiskVolume=/vol;DiskPool=pool1;Path=/mypath;Server=qwer.poil.com;
....

I need to print to an output file the jobid, client, the disk volume, the diskpool, etc. so output file will look like:

 1325 abc.xyz.com /vol pool1 /mypath qwer.poil.com
 <file2 info>
 <file3 info>
 .....

I tried doing this to get the jobid:

 if(@grepres=grep{/jobid/} <TRY>){
 @splitres=split(' ',$grepres[0]);
 $jobid=$splitres[1];
 $jobid =~ s/\D//g;

Where is the fh.

But it only returns the first number in the line, ie the timestamp.

How do I get the client name or the Server name?

Is perl approrpiate for this?

Upvotes: 0

Views: 329

Answers (3)

Nijin
Nijin

Reputation: 65

Perl regex will be a perfect solution for you. As it is a log file, I hope the format will not change and therefore you can easily use Perl regex. The below script can help you.

#!/usr/bin/perl
open (DATA,"<test") or print "cannot open test file";
open (DATA1,">test1") or print "cannot open test1 file";
while (<DATA>)
{
if ($_=~/.*jobid=(\d+).*client\s*(\w+\.\w+\.\w+).*DiskVolume=(\/\w+).*DiskPool=(\w+).*Path=(\/\w+).*Server=(\w+\.\w+\.\w+).*/)
{
print DATA1 "$1 $2 $3 $4 $5 $6\n";
}
}
close (DATA);
close(DATA1);

The output which I have obtained is

[root@server perl]# cat test1
1325 abc.xyz.com /vol pool1 /mypath qwer.poil.com

Upvotes: 0

Borodin
Borodin

Reputation: 126772

You should pull all of the data you need from each file into a hash before reformatting it.

This program starts with a list of the field names that you want to appear in the output, and builds a regex that matches those fields followed by their values.

Then all that is necessary is to find all occurrences of that pattern in all of the lines of the file and add them to the hash.

There is a final check to make sure that all the the required fields are in the hash, and then the contents are printed as a simple hash slice.

Please ask if any of this is unclear to you.

use strict;
use warnings;

my @names = qw/ jobid client DiskVolume DiskPool Path Server /;
my @files = qw/ dmesg1.txt dmesg2.txt dmesg3.txt /;

my $re = join '|', @names;
$re = qr{ \b($re)\b [\s=]+ ([\w./]+) }x;

for my $filename ( @files ) {

  open my $fh, '<', $filename or do {
    warn "Can't open '$filename' for reading: $!";
    next;
  };

  my %data;
  while ( my $line = <$fh> ) {
    $data{$1} = $2 while $line =~ /$re/g;
  }

  if ( my @missing = grep { not exists $data{$_} } @names ) {
    warn sprintf 'Missing %s "%s" from file "%s"',
        @missing == 1 ? 'field' : 'fields',
        join(', ', @missing),
        $filename;
    next;
  }

  print "@data{@names}\n";
}

output

1325 abc.xyz.com /vol pool1 /mypath qwer.poil.com

Upvotes: 1

Ryan J
Ryan J

Reputation: 8323

If the lines are the same format all the time, you can use a foreach loop and split each line as you did, while using the array to access each of the fields you want. Try this.

my @logfile = <TRY>;
close TRY;

my $jobid;

foreach my $line (@logfile) {
    chomp $line; # remove trailing newline

    # might be good to check for blank lines or anything invalid
    if ( $line !~ /^$/ ) {
        my @splitres=split(' ',$line);
        $jobid=$splitres[1];
        $jobid =~ s/\D//g;

        # and so on with the remaining fields...
    }
}

Upvotes: 1

Related Questions