Reputation: 609
I need help splitting the following string into (Date, ID, msecs)
May 26 09:33:33 localhost archiver: saving ID 0191070818_1462647213_489705 took 180 msec
I only want the first part of the ID before the first underscore.
So this is what I want the output to look like
May 26 09:33:33, 0191070818, 180
I am having trouble figuring out what to put in the regex
use strict;
use warnings;
my $data = 'May 26 09:33:33 localhost archiver: saving ID 0091070818_1432647213_489715 took 180 msec';
my @values = split('/[]/', $data);
foreach my $val (@values) {
print "$val\n";
}
exit 0;
Upvotes: 3
Views: 192
Reputation: 242333
split
doesn't look like the correct tool for the job. I'd use a regex match:
my @values = $data =~ /^([[:alpha:]]{3}\s[0-9][0-9]\s[0-9][0-9]:[0-9][0-9]:[0-9][0-9]) # date & time
\s.*?\sID\s
([0-9]+) # ID
.*\stook\s
([0-9]+) # duration
\smsec/x;
print join(',', @values), "\n";
Upvotes: 3
Reputation: 69314
It might even be simplest to just split the data on whitespace (and then reconstruct the date by joining together the first three fields). It's not very sophisticated, but it gets the job done.
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
my $data = 'May 26 09:33:33 localhost archiver: saving ID 0091070818_1432647213_489715 took 180 msec';
my @values = split(/\s+/, $data);
my $date = join ' ', @values[0,1,2];
my $id = $values[7];
my $time = $values[9];
say "Date: $date";
say "ID: $id";
say "Time: $time";
Which gives:
Date: May 26 09:33:33
ID: 0091070818_1432647213_489715
Time: 180
Upvotes: 4
Reputation: 425348
I don't know that split()
is the best approach. This code matches your target ID and extracts it:
($id) = $data =~ m/(?<=ID )[^_]+/g;
The regex uses a look-behind (?<=ID )
to anchor the start of the match just to the right of "ID "
, then grabs everything not an underscore that follows.
Here's some test code:
my $data = 'May 26 09:33:33 localhost archiver: saving ID 0091070818_1432647213_489715 took 180 msec';
($id) = $data =~ m/(?<=ID )[^_]+/g;
print $id
Output:
0091070818
See live demo.
Upvotes: 2
Reputation: 126762
It's probably best to do this with three separate patterns. The code below demonstrates
I've used the /x
modifier so that I can put spaces in the regex patterns for improved readability
Unless you are certain that your data will be well-formed (i.e. it is the output of a program) you should add tests to make sure that all three values are defined after the pattern match. Or you can directly test the pattern match itself
use strict;
use warnings;
use v5.10;
my $s = 'May 26 09:33:33 localhost archiver: saving ID 0191070818_1462647213_489705 took 180 msec';
for ( $s ) {
my ($date) = / ^ ( [a-z]+ \s+ \d+ \s+ [\d:]+ ) /ix;
my ($id) = / ID \s+ (\d+) _ /x;
my ($msecs) = / (\d+) \s+ msec /x;
say join ',', $date, $id, $msecs;
}
May 26 09:33:33,0191070818,180
Upvotes: 2
Reputation: 93805
split
is not the tool to use here. Here is a regex that works at least for your specific case you listed.
my $data = 'May 26 09:33:33 localhost archiver: saving ID 0091070818_1432647213_489715 took 180 msec';
$data =~ m/^(\w+ \d+ \d\d:\d\d:\d\d).+saving ID (\d+).+took (\d+) msec$/;
my ($date, $id, $msec) = ($1,$2,$3);
print "$date, $id, $msec\n";
Upvotes: 1
Reputation: 53508
OK. That split just isn't going to work - because you've used single quotes, the string is used literally. As it doesn't occur in your sample text, it doesn't do anything at all.
Split 'cuts up' a string based on a field separator, which probably isn't what you want. E.g.
split ( ' ', $data );
Will give you:
$VAR1 = [
'May',
'26',
'09:33:33',
'localhost',
'archiver:',
'saving',
'ID',
'0091070818_1432647213_489715',
'took',
'180',
'msec'
];
Given your string doesn't really 'fieldify' like that properly, I'd suggest a different approach:
You need to select the things you want out of it. Assuming you're not getting some somewhat odd records mixed in:
my $data = 'May 26 09:33:33 localhost archiver: saving ID 0091070818_1432647213_489715 took 180 msec';
my ($time_str) = ( $data =~ m/^(\w+ \d+ \d{2}:\d{2}:\d{2})/ );
my ($id) = ( $data =~ m/(\d+)_/ );
my ($msec) = ( $data =~ m/(\d+) msec/ );
print "$time_str, $id, $msec,\n";
Note - you can combine your regex patterns (as some of the examples indicate). I've done it this way hopefully to simplify and clarify what's happening. The regular expression match is applied to $data
(because of =~
). The 'matching' elements in brackets ()
are then extracted and 'returned' to be inserted into the variable on the lefthand side.
(Note - you need to have the 'my ( $msec)' in brackets, because that way the value is used, rather than the result of the test (true/false))
Upvotes: 4