Tim Hsu
Tim Hsu

Reputation: 412

Perl How to do optional regex match

Suppose following string

$doc=<<'TEXT_END';
<LI>11:20
           </LI>
<LI><a href="/showtime/ticket/4f3a3cc7017f4202b5add5803594fdd9/" class="openbox">13:55 &nbsp; <font color=a81234 size=-1>訂票</font>&nbsp;</a> </LI>

TEXT_END

How to capture 11:20 and 13:55 with one regular expression

I don't know how to do optional match (letting the following two tag ignorable)

<a href=".....">
<font color="...."> 

訂票 means "book a ticket".(the website add a link when it is available for booking)

sorry for my bad english

below is my code, it doesn't work correctly.

#!/usr/bin/env perl
#use utf8;
use LWP::Simple;

binmode(STDIN, ':encoding(utf8)');
binmode(STDOUT, ':encoding(utf8)');
binmode(STDERR, ':encoding(utf8)'); 

my $doc = get 'http://www.atmovies.com.tw/showtime/theater_t06609_a06.html';

my @movies = ($doc =~ /<a href="\/movie\/([a-z]+\d+)\/">([^><]+)<\/a>.+?<UL>(.+?)<\/UL>/gs);

for($i=1; $i<=$#movies; $i+=3){
    print "$movies[$i]\n";
    print $movies[$i+1]."\n\n";

    #this work just fine!
    my @times = ($movies[$i+1] =~ /<LI>([^<>]+)\r\n\s+<\/LI>/g);
    for($j=0; $j<=$#times; $j++){
        print "$times[$j]\n";
    }

    #this regex doesn't work correctly, it catch nothing
    @times_available=($movies[$i+1] =~ /<LI><a href="\/showtime\/ticket\/[0-9a-f]{32}\/" class="openbox">([^><\s]+) &nbsp; <font color=a81234 size=-1>☆訂票<\/font>&nbsp;<\/a> <\/LI>/g);
    for($j=0; $j<=$#times_available; $j++){
        print "$times_available[$j]\n";
    }

}

Upvotes: 0

Views: 149

Answers (1)

Lee Duhem
Lee Duhem

Reputation: 15121

You could try this

@times = $doc =~ m/>\s*([\d:]+)/g;

Here is the full test program:

#!/usr/bin/perl

use warnings;
use strict;

use utf8;

use Data::Dumper;

my $doc=<<'TEXT_END';
<LI>11:20
           </LI>
       <LI><a href="/showtime/ticket/4f3a3cc7017f4202b5add5803594fdd9/" class="openbox">13:55 &nbsp; <font color=a81234 size=-1>訂票</font>&nbsp;</a> </LI>

TEXT_END

my @times = $doc =~ m/>\s*([\d:]+)/g;

print Dumper(\@times);

And the result:

$ perl t020.pl 
$VAR1 = [
          '11:20',
          '13:55'
        ];

Upvotes: 1

Related Questions