Reputation: 412
Suppose following string
$doc=<<'TEXT_END';
<LI>11:20
</LI>
<LI><a href="/showtime/ticket/4f3a3cc7017f4202b5add5803594fdd9/" class="openbox">13:55 <font color=a81234 size=-1>訂票</font> </a> </LI>
TEXT_END
How to capture 11:20 and 13:55 with one regular expression
I don't know how to do optional match (letting the following two tag ignorable)
<a href=".....">
<font color="....">
訂票 means "book a ticket".(the website add a link when it is available for booking)
sorry for my bad english
below is my code, it doesn't work correctly.
#!/usr/bin/env perl
#use utf8;
use LWP::Simple;
binmode(STDIN, ':encoding(utf8)');
binmode(STDOUT, ':encoding(utf8)');
binmode(STDERR, ':encoding(utf8)');
my $doc = get 'http://www.atmovies.com.tw/showtime/theater_t06609_a06.html';
my @movies = ($doc =~ /<a href="\/movie\/([a-z]+\d+)\/">([^><]+)<\/a>.+?<UL>(.+?)<\/UL>/gs);
for($i=1; $i<=$#movies; $i+=3){
print "$movies[$i]\n";
print $movies[$i+1]."\n\n";
#this work just fine!
my @times = ($movies[$i+1] =~ /<LI>([^<>]+)\r\n\s+<\/LI>/g);
for($j=0; $j<=$#times; $j++){
print "$times[$j]\n";
}
#this regex doesn't work correctly, it catch nothing
@times_available=($movies[$i+1] =~ /<LI><a href="\/showtime\/ticket\/[0-9a-f]{32}\/" class="openbox">([^><\s]+) <font color=a81234 size=-1>☆訂票<\/font> <\/a> <\/LI>/g);
for($j=0; $j<=$#times_available; $j++){
print "$times_available[$j]\n";
}
}
Upvotes: 0
Views: 149
Reputation: 15121
You could try this
@times = $doc =~ m/>\s*([\d:]+)/g;
Here is the full test program:
#!/usr/bin/perl
use warnings;
use strict;
use utf8;
use Data::Dumper;
my $doc=<<'TEXT_END';
<LI>11:20
</LI>
<LI><a href="/showtime/ticket/4f3a3cc7017f4202b5add5803594fdd9/" class="openbox">13:55 <font color=a81234 size=-1>訂票</font> </a> </LI>
TEXT_END
my @times = $doc =~ m/>\s*([\d:]+)/g;
print Dumper(\@times);
And the result:
$ perl t020.pl
$VAR1 = [
'11:20',
'13:55'
];
Upvotes: 1