Reputation: 2291
I have a script and a file.
[evelden@vatasu4435 perl]$ cat file
06:35:42,734
foo 06:35:42 bar
[evelden@vatasu4435 perl]$ cat script
#!/usr/bin/perl
while(<>){
if(s/(\d\d:\d\d).*/\1/){
print;
}
}
So at the back in the regex it says .*, but not at the front.
Do,
[evelden@vatasu4435 perl]$ ./script file
06:35
foo 06:35
Apparently .* at the end takes as much as possible, which is OK.
But what I do not understand is where 'foo' comes from in the answer. This is my question.
If I change the regex in in: s/.*(\d\d:\d\d).*/\1/
, dus at the front also .* , then the answer is what I expected:
[evelden@vatasu4435 perl]$ script file
35:42
35:42
Now he is greedy at the front, but that is OK.
Upvotes: 1
Views: 60
Reputation: 6798
OP's original regex is not specific about where to start or end capture.
s/(\d\d:\d\d).*/\1/
-- look in the string for\d{2}:\d{2}
and anything after it. Substitute found pattern (\d{2}:\d{2}.*
-- digits with anything following it) with captured two digits\d{2}:\d{2}
. There is nothing in the pattern related to what is before\d{2}:\d{2}
and no replacement applied to this part --foo
is not touched.
Perhaps OP intended to write the following code
use strict;
use warnings;
s/.*?(\d{2}:\d{2}):.*/$1/ && print for <>;
Two simple solutions to the problem
use strict;
use warnings;
use feature 'say';
while(<DATA>) {
/(\d{2}:\d{2}):/;
say $1;
}
__DATA__
06:35:42,734
foo 06:35:42 bar
Or other variation
use strict;
use warnings;
use feature 'say';
while(<DATA>) {
/\b(\d{2}:\d{2})/;
say $1;
}
__DATA__
06:35:42,734
foo 06:35:42 bar
Or may be as following
use strict;
use warnings;
use feature 'say';
my $data = do { local $/; <DATA> };
my @time = $data =~ /\b(\d{2}:\d{2})/g;
say for @time;
__DATA__
06:35:42,734
foo 06:35:42 bar
Output
06:35
06:35
Upvotes: 1
Reputation: 780724
Only the part of the line that matches the regular expression is replaced by s///
. Since the regexp isn't anchored on the left, it matches the part of the line beginning with the time, and replaces that part. The part before the match is left unchanged, so foo
remains in the line.
Upvotes: 2
Reputation: 54323
The content of the current line is placed in $_
. Your s///
operates on that $_
, substituting the complete pattern with the content of $1
(or \1
, as you've put it). That's the content of the first capture group in the pattern. But your pattern is not anchored, so it will start matching somewhere in the string, and replace from there. It's doing exactly what you have told it.
If you wanted to get rid of everything in the front, your second pattern is correct. If you wanted to only change lines that start with the pattern, use a ^
anchor at the front.
Upvotes: 3