yonetpkbji
yonetpkbji

Reputation: 1019

Perl Regular expression to insert string at specific places

I have the following piece of code:

#!/usr/bin/perl

use strict;
use warnings;

use URI qw( );

my @insert_words = qw( HELLO );

while (<DATA>) {
chomp;
my $url = URI->new($_);
my $path = $url->path();

for (@insert_words) {
  # Use package vars to communicate with /(?{})/ blocks.
  my $insert_word = $_;
  local our @paths;
  $path =~ m{
     ^(.*/)([^/]*)((?:/.*)?)\z
     (?{

        push @paths, "$1$insert_word$2$3";
        if (length($2)) {
           push @paths, "$1$insert_word$3";
           push @paths, "$1$2$insert_word$3";
        }
     })
     (?!)
  }x;

  for (@paths) {
     $url->path($_);
     print "$url\n";
    }
  }
}

__DATA__
http://www.bagandboxfactory.com/index.php?route=checkout/
http://www.stackoverflow.com/dog/cat/rabbit/
http://www.superuser.co.uk/dog/cat/rabbit/hamster/

At the moment the above piece of code works perfectly for the stackoverflow and superuser urls in __DATA__. It gives the following output for the stackoverflow url:

http://www.stackoverflow.com/dog/cat/rabbit/HELLO
http://www.stackoverflow.com/dog/cat/HELLOrabbit/
http://www.stackoverflow.com/dog/cat/HELLO/
http://www.stackoverflow.com/dog/cat/rabbitHELLO/
http://www.stackoverflow.com/dog/HELLOcat/rabbit/
http://www.stackoverflow.com/dog/HELLO/rabbit/
http://www.stackoverflow.com/dog/catHELLO/rabbit/
http://www.stackoverflow.com/HELLOdog/cat/rabbit/
http://www.stackoverflow.com/HELLO/cat/rabbit/
http://www.stackoverflow.com/dogHELLO/cat/rabbit/

As you can see it inserts the string HELLO at specific places when it comes across a slash (/).

The problem I am having:

I want the same thing to happen when an equals sign (=) is found in the url.

Using http://www.bagandboxfactory.com/index.php?route=checkout as an example, I want the output to give me the following:

http://www.bagandboxfactory.com/index.php?route=HELLOcheckout/   <- puts HELLO before the string after the equals
http://www.bagandboxfactory.com/index.php?route=HELLO/   <- replaces the string after the equals with HELLO
http://www.bagandboxfactory.com/index.php?route=checkoutHELLO/  <- puts HELLO after the string that is after the equals

I thought that changing the regular expression from

^(.*/)([^/]*)((?:/.*)?)\z

to

^(.*[/=])([^/=]*)((?:[/=].*)?)\z would work but it does not.

What would I need to change to the regular expression to make it do this?

Your help is much appreciated, many thanks

__UPDATE___

It needs to be able to handle multiple parameters, for example if I had the url http://www.example.com/dog/cat=2&foo=5 the output I should get is as follows:

http://www.example.com/HELLOdog/cat=2&foo=5  
http://www.example.com/HELLO/cat=2&foo=5
http://www.example.com/dogHELLO/cat=2&foo=5
http://www.example.com/dog/cat=HELLO2&foo=5
http://www.example.com/dog/cat=HELLO&foo=5
http://www.example.com/dog/cat=2HELLO&foo=5
http://www.example.com/dog/cat=2&foo=HELLO5
http://www.example.com/dog/cat=2&foo=HELLO
http://www.example.com/dog/cat=2&foo=5HELLO

The code I already have works correctly and does this for every slash it comes across in the url, but I now want it to do it when it comes across an = in the url as well (or any other character I choose to specify in the regex e.g [/=@-]).

Upvotes: 1

Views: 313

Answers (1)

Greg Bacon
Greg Bacon

Reputation: 139661

Putting the regex engine to work to perform the backtracking for you is a clever technique.

The main problem is the part after a question mark is the query, available through $url->query. $url->path returns the path component without the query.

Modifying your code to

#!/usr/bin/perl

use strict;
use warnings;

use URI qw( );

my @insert_words = qw( HELLO );

while (<DATA>) {
    chomp;
    my $url = URI->new($_);
    my $path = $url->path();
    my $query = $url->query;

    for (@insert_words) {
      # Use package vars to communicate with /(?{})/ blocks.
      my $insert_word = $_;
      local our @paths;
      $path =~ m{
         ^(.*/)([^/]*)((?:/.*)?)\z
         (?{
            push @paths, "$1$insert_word$2$3";
            if (length($2)) {
               push @paths, "$1$insert_word$3";
               push @paths, "$1$2$insert_word$3";
            }
         })
         (?!)
      }x;

      local our @queries;
      if (defined $query) {
          $query =~ m{
              ^(.*[/=])([^/=&]*)((?:[/=&].*)?)\z
              (?{
                  if (length $2) {
                      push @queries, "$1$insert_word$2$3";
                      push @queries, "$1$insert_word$3";
                      push @queries, "$1$2$insert_word$3";
                  }
              })
              (?!)
          }x;
      }

      for (@paths) {
          $url->path($_);

          if (@queries) {
              for (@queries) {
                  $url->query($_);
                  print $url, "\n";
              }
          }
          else {
              print $url, "\n";
          }
      }
    }
}

__DATA__
http://www.bagandboxfactory.com/index.php?route=checkout/
http://www.stackoverflow.com/dog/cat/rabbit/
http://www.superuser.co.uk/dog/cat/rabbit/hamster/
http://www.example.com/index.php?route=9&other=7/

gives the following output. The logic for query replacements is slightly different because it would append each @insert_words after the trailing slash in the query, if present.

http://www.bagandboxfactory.com/HELLOindex.php?route=HELLOcheckout/
http://www.bagandboxfactory.com/HELLOindex.php?route=HELLO/
http://www.bagandboxfactory.com/HELLOindex.php?route=checkoutHELLO/
http://www.bagandboxfactory.com/HELLO?route=HELLOcheckout/
http://www.bagandboxfactory.com/HELLO?route=HELLO/
http://www.bagandboxfactory.com/HELLO?route=checkoutHELLO/
http://www.bagandboxfactory.com/index.phpHELLO?route=HELLOcheckout/
http://www.bagandboxfactory.com/index.phpHELLO?route=HELLO/
http://www.bagandboxfactory.com/index.phpHELLO?route=checkoutHELLO/
http://www.stackoverflow.com/dog/cat/rabbit/HELLO
http://www.stackoverflow.com/dog/cat/HELLOrabbit/
http://www.stackoverflow.com/dog/cat/HELLO/
http://www.stackoverflow.com/dog/cat/rabbitHELLO/
http://www.stackoverflow.com/dog/HELLOcat/rabbit/
http://www.stackoverflow.com/dog/HELLO/rabbit/
http://www.stackoverflow.com/dog/catHELLO/rabbit/
http://www.stackoverflow.com/HELLOdog/cat/rabbit/
http://www.stackoverflow.com/HELLO/cat/rabbit/
http://www.stackoverflow.com/dogHELLO/cat/rabbit/
http://www.superuser.co.uk/dog/cat/rabbit/hamster/HELLO
http://www.superuser.co.uk/dog/cat/rabbit/HELLOhamster/
http://www.superuser.co.uk/dog/cat/rabbit/HELLO/
http://www.superuser.co.uk/dog/cat/rabbit/hamsterHELLO/
http://www.superuser.co.uk/dog/cat/HELLOrabbit/hamster/
http://www.superuser.co.uk/dog/cat/HELLO/hamster/
http://www.superuser.co.uk/dog/cat/rabbitHELLO/hamster/
http://www.superuser.co.uk/dog/HELLOcat/rabbit/hamster/
http://www.superuser.co.uk/dog/HELLO/rabbit/hamster/
http://www.superuser.co.uk/dog/catHELLO/rabbit/hamster/
http://www.superuser.co.uk/HELLOdog/cat/rabbit/hamster/
http://www.superuser.co.uk/HELLO/cat/rabbit/hamster/
http://www.superuser.co.uk/dogHELLO/cat/rabbit/hamster/
http://www.example.com/HELLOindex.php?route=9&other=HELLO7/
http://www.example.com/HELLOindex.php?route=9&other=HELLO/
http://www.example.com/HELLOindex.php?route=9&other=7HELLO/
http://www.example.com/HELLOindex.php?route=HELLO9&other=7/
http://www.example.com/HELLOindex.php?route=HELLO&other=7/
http://www.example.com/HELLOindex.php?route=9HELLO&other=7/
http://www.example.com/HELLO?route=9&other=HELLO7/
http://www.example.com/HELLO?route=9&other=HELLO/
http://www.example.com/HELLO?route=9&other=7HELLO/
http://www.example.com/HELLO?route=HELLO9&other=7/
http://www.example.com/HELLO?route=HELLO&other=7/
http://www.example.com/HELLO?route=9HELLO&other=7/
http://www.example.com/index.phpHELLO?route=9&other=HELLO7/
http://www.example.com/index.phpHELLO?route=9&other=HELLO/
http://www.example.com/index.phpHELLO?route=9&other=7HELLO/
http://www.example.com/index.phpHELLO?route=HELLO9&other=7/
http://www.example.com/index.phpHELLO?route=HELLO&other=7/
http://www.example.com/index.phpHELLO?route=9HELLO&other=7/

Upvotes: 1

Related Questions