secmask
secmask

Reputation: 8107

Why doesn't this regular expression match?

I have a Perl script from Squid web proxy:

#!/usr/bin/perl
$|=1;
while (<>) {
    @X = split;
    $x = $X[0];
    $_ = $X[1];
    if (m/^http:\/\/([0-9.]{4}|.*\.youtube\.com|.*\.googlevideo\.com|.*\.video\.google\.com).*?\&(itag=22).*?\&(id=[a-zA-Z0-9]*)/) {
        print $x . "http://video-srv.youtube.com.SQUIDINTERNAL/" . $2 . "&" . $3 . "\n";
    # youtube Normal screen always HD itag 35, Normal screen never HD itag 34, itag=18 <--normal?
    } elsif (m/^http:\/\/([0-9.]{4}|.*\.youtube\.com|.*\.googlevideo\.com|.*\.video\.google\.com).*?\&(itag=[0-9]*).*?\&(id=[a-zA-Z0-9]*)/) {
        print $x . "http://video-srv.youtube.com.SQUIDINTERNAL/" . $2 . "&" . $3 . "\n";

    } else {
        print $x . $_ . "\n";
    }
}

that I got from http://wiki.squid-cache.org/ConfigExamples/DynamicContent/YouTube. I've tested input such as

http://v24.lscache6.c.youtube.com/videoplayback?sparams=id%2Cexpire%2Cip%2Cipbits%2Citag%2Calgorithm%2Cburst%2Cfactor%2Coc%3AU0hPRVFUTl9FSkNOOV9JTlJF&fexp=905230%2C901013&algorithm=throttle-factor&itag=34&ipbits=0&burst=40&sver=3&signature=2A5088FD4F64CF9D58A5B798E14452D71B51BAE8.2EABF06D09C8C81650266C5464CF1D0B4D6C25CC&expire=1300190400&key=yt1&ip=0.0.0.0&factor=1.25&id=e838f2cd3549e3cb

in RegexBuddy with Perl syntax, and I see it match the second regular expression in above script. But it didn't match when I ran the script. I'm not a Perl programmer, so where was I wrong?

Upvotes: 0

Views: 352

Answers (2)

jb.
jb.

Reputation: 10331

Why not use the URI parser module? Here is a simple example using one. That way you can grab the host out by a simple $uri->host() and check it against your list of hosts. You should also be able to get the itag and id fields too regardless of what order they're in, or if there are other attributes as well, which could break a regex.

Upvotes: 1

Беров
Беров

Reputation: 1393

I would recommend to divide the regex in separate variabales then modify one of them at a time. This way you can find the problem yourself.

I am not sure if someone will bother to debug your programm. Example:

 my $part1 =qr/http:\/\/([0-9.]{4}/;
 my $part2 = qr/.*\.youtube\.com/;
 #etc ... then
 if (m/^part1|$part2....

Upvotes: 1

Related Questions