user2007843
user2007843

Reputation: 609

Splitting a string using regex in Perl

I need help splitting the following string into (Date, ID, msecs)

May 26 09:33:33 localhost archiver: saving ID 0191070818_1462647213_489705 took 180 msec

I only want the first part of the ID before the first underscore.

So this is what I want the output to look like

May 26 09:33:33, 0191070818, 180

I am having trouble figuring out what to put in the regex

use strict;
use warnings;

my $data = 'May 26 09:33:33 localhost archiver: saving ID 0091070818_1432647213_489715 took 180 msec';

my @values = split('/[]/', $data);

foreach my $val (@values) {
  print "$val\n";
}

exit 0;

Upvotes: 3

Views: 192

Answers (6)

choroba
choroba

Reputation: 242333

split doesn't look like the correct tool for the job. I'd use a regex match:

my @values = $data =~ /^([[:alpha:]]{3}\s[0-9][0-9]\s[0-9][0-9]:[0-9][0-9]:[0-9][0-9]) # date & time
                       \s.*?\sID\s
                       ([0-9]+)            # ID
                       .*\stook\s
                       ([0-9]+)            # duration
                       \smsec/x;
print join(',', @values), "\n";

Upvotes: 3

Dave Cross
Dave Cross

Reputation: 69314

It might even be simplest to just split the data on whitespace (and then reconstruct the date by joining together the first three fields). It's not very sophisticated, but it gets the job done.

#!/usr/bin/perl

use strict;
use warnings;
use 5.010;

my $data = 'May 26 09:33:33 localhost archiver: saving ID 0091070818_1432647213_489715 took 180 msec';

my @values = split(/\s+/, $data);

my $date = join ' ', @values[0,1,2];
my $id   = $values[7];
my $time = $values[9];

say "Date: $date";
say "ID:   $id";
say "Time: $time";

Which gives:

Date: May 26 09:33:33
ID:   0091070818_1432647213_489715
Time: 180

Upvotes: 4

Bohemian
Bohemian

Reputation: 425348

I don't know that split() is the best approach. This code matches your target ID and extracts it:

($id) = $data =~ m/(?<=ID )[^_]+/g;

The regex uses a look-behind (?<=ID ) to anchor the start of the match just to the right of "ID ", then grabs everything not an underscore that follows.


Here's some test code:

my $data = 'May 26 09:33:33 localhost archiver: saving ID 0091070818_1432647213_489715 took 180 msec';
($id) = $data =~ m/(?<=ID )[^_]+/g;
print $id

Output:

0091070818

See live demo.

Upvotes: 2

Borodin
Borodin

Reputation: 126762

It's probably best to do this with three separate patterns. The code below demonstrates

I've used the /x modifier so that I can put spaces in the regex patterns for improved readability

Unless you are certain that your data will be well-formed (i.e. it is the output of a program) you should add tests to make sure that all three values are defined after the pattern match. Or you can directly test the pattern match itself

use strict;
use warnings;
use v5.10;

my $s = 'May 26 09:33:33 localhost archiver: saving ID 0191070818_1462647213_489705 took 180 msec';

for ( $s ) {

    my ($date)  = / ^ ( [a-z]+ \s+ \d+ \s+ [\d:]+ ) /ix;
    my ($id)    = / ID \s+ (\d+) _ /x;
    my ($msecs) = / (\d+) \s+ msec /x;

    say join ',', $date, $id, $msecs;
}

output

May 26 09:33:33,0191070818,180

Upvotes: 2

Andy Lester
Andy Lester

Reputation: 93805

split is not the tool to use here. Here is a regex that works at least for your specific case you listed.

my $data = 'May 26 09:33:33 localhost archiver: saving ID 0091070818_1432647213_489715 took 180 msec';

$data =~ m/^(\w+ \d+ \d\d:\d\d:\d\d).+saving ID (\d+).+took (\d+) msec$/;

my ($date, $id, $msec) = ($1,$2,$3);

print "$date, $id, $msec\n";

Upvotes: 1

Sobrique
Sobrique

Reputation: 53508

OK. That split just isn't going to work - because you've used single quotes, the string is used literally. As it doesn't occur in your sample text, it doesn't do anything at all.

Split 'cuts up' a string based on a field separator, which probably isn't what you want. E.g.

 split ( ' ', $data ); 

Will give you:

$VAR1 = [
          'May',
          '26',
          '09:33:33',
          'localhost',
          'archiver:',
          'saving',
          'ID',
          '0091070818_1432647213_489715',
          'took',
          '180',
          'msec'
        ];

Given your string doesn't really 'fieldify' like that properly, I'd suggest a different approach:

You need to select the things you want out of it. Assuming you're not getting some somewhat odd records mixed in:

my $data = 'May 26 09:33:33 localhost archiver: saving ID 0091070818_1432647213_489715 took 180 msec';

my ($time_str) = ( $data =~ m/^(\w+ \d+ \d{2}:\d{2}:\d{2})/ );
my ($id)       = ( $data =~ m/(\d+)_/ );
my ($msec)     = ( $data =~ m/(\d+) msec/ );
print "$time_str, $id, $msec,\n";

Note - you can combine your regex patterns (as some of the examples indicate). I've done it this way hopefully to simplify and clarify what's happening. The regular expression match is applied to $data (because of =~). The 'matching' elements in brackets () are then extracted and 'returned' to be inserted into the variable on the lefthand side.

(Note - you need to have the 'my ( $msec)' in brackets, because that way the value is used, rather than the result of the test (true/false))

Upvotes: 4

Related Questions