kay
kay

Reputation: 186

matching a pattern and replaceing

I am trying to replace h:mm:ss format held within tag in my xml file with format "h hours, mm minutes, ss seconds". The problem I am facing is the regex is good to replace if the time tag starts and ends withing a line. I am unable to replace the format when the tag starts and ends at second line.

here is what i am trying -

while(<$rd>) {
   my $currLine = $_;
   $_ =~ s/\<time\> *(.):(..):(..) *\<\/time>/$1 hours, $2 minutes, $3 seconds/g;
   print FILE $_;
}

My input file looks like this -

<time> 1:04:55    </time> this is a good time <time> 
2:04:22 </time> to ask your question Alfred, 
but did you check time <time> 3:45:32 </time> and <time> 02:03:45 </time>

I am able to replace the format "h:mm:ss" to "h hours, mm minutes, ss seconds" but not for 2:04:22 as the tag opens and ends at different line.

Upvotes: 1

Views: 97

Answers (2)

Samiron
Samiron

Reputation: 5317

Dont you need the multiline regex feature? Here is the code snippet I have tried with

my $str = '<time> 1:04:55    </time> this is a good time <time>
2:04:22 </time> to ask your question Alfred,
but did you check time <time> 3:45:32 </time> and <time> 02:03:45 </time>';

$str =~ /<time>[\n\s]*(\d):(\d\d):(\d\d)[\n\s]*<\/time>/mg;
print $1, "\n";
print $2, "\n";
print $3, "\n";

OUTPUT

1
04
55

Here /m tells the regex engine to consider the $str as multiline string. And using g would apply the changes in all places in the string.

I didnt write the exact solution you need rather just the way multiline regex works. Let me know if you need more help.

EDIT

I think its worth to note in this question about a multiline regex too.

 my $str = '<time> 1:04:55    </time> this is a good time <time>
     2:04:22 </time> to ask your question Alfred,
     but did you check time <time> 3:45:32 </time> and <time> 02:03:45 </time>';

$str =~ s/<time>[\n\s]*(\d?\d):(\d\d):(\d\d)[\n\s]*<\/time>/$1 hours, $2 minutes, $3 seconds/mg;
print $str;

OUTPUT

1 hours, 04 minutes, 55 seconds this is a good time 2 hours, 04 minutes, 22 seconds to ask your question Alfred,
but did you check time 3 hours, 45 minutes, 32 seconds and 02 hours, 03 minutes, 45 seconds

The thing is your complete input should be in the string that you are applying the regular expression on.

Upvotes: 0

ysth
ysth

Reputation: 98388

Instead of reading line by line, read up to a </time>, and allow for other whitespace than ' ':

{
    use autodie 'open';
    open my $input, '<', 'input.xml';
    open my $output, '>', 'output.xml';
    local $/ = '</time>';
    while (<$input>) {
        s/<time>\s*(.):(..):(..)\s*<\/time>/$1 hours, $2 minutes, $3 seconds/;
        print $output $_;
    }
}

Upvotes: 4

Related Questions