NicolaBaldi
NicolaBaldi

Reputation: 133

Regular Expressions: querystring parameters matching

I'm trying to learn something about regular expressions.
Here is what I'm going to match:

/parent/child  
/parent/child?  
/parent/child?firstparam=abc123  
/parent/child?secondparam=def456  
/parent/child?firstparam=abc123&secondparam=def456  
/parent/child?secondparam=def456&firstparam=abc123  
/parent/child?thirdparam=ghi789&secondparam=def456&firstparam=abc123  
/parent/child?secondparam=def456&firstparam=abc123&thirdparam=ghi789  
/parent/child?thirdparam=ghi789  
/parent/child/  
/parent/child/?  
/parent/child/?firstparam=abc123  
/parent/child/?secondparam=def456  
/parent/child/?firstparam=abc123&secondparam=def456  
/parent/child/?secondparam=def456&firstparam=abc123  
/parent/child/?thirdparam=ghi789&secondparam=def456&firstparam=abc123  
/parent/child/?secondparam=def456&firstparam=abc123&thirdparam=ghi789  
/parent/child/?thirdparam=ghi789

My expression should "grabs" abc123 and def456.
And now just an example about what I'm not going to match ("question mark" is missing):

/parent/child/firstparam=abc123&secondparam=def456

Well, I built the following expression:

^(?:/parent/child){1}(?:^(?:/\?|\?)+(?:firstparam=([^&]*)|secondparam=([^&]*)|[^&]*)?)?

But that doesn't work.
Could you help me to understand what I'm doing wrong?
Thanks in advance.

UPDATE 1

Ok, I made other tests. I'm trying to fix the previous version with something like this:

/parent/child(?:(?:\?|/\?)+(?:firstparam=([^&]*)|secondparam=([^&]*)|[^&]*)?)?$

Let me explain my idea:
Must start with /parent/child:

/parent/child

Following group is optional

(?: ... )?

The previous optional group must starts with ? or /?

(?:\?|/\?)+

Optional parameters (I grab values if specified parameters are part of querystring)

(?:firstparam=([^&]*)|secondparam=([^&]*)|[^&]*)?

End of line

$

Any advice?

UPDATE 2

My solution must be based just on regular expressions. Just for example, I previously wrote the following one:

/parent/child(?:[?&/]*(?:firstparam=([^&]*)|secondparam=([^&]*)|[^&]*))*$

And that works pretty nice. But it matches the following input too:

/parent/child/firstparam=abc123&secondparam=def456

How could I modify the expression in order to not match the previous string?

Upvotes: 3

Views: 4397

Answers (5)

user3475634
user3475634

Reputation: 11

This regex will work as long as you know what your parameter names are going to be and you're sure that they won't change.

\/parent\/child\/?\?(?:(?:firstparam|secondparam|thirdparam)\=([\w]+)&?)(?:(?:firstparam|secondparam|thirdparam)\=([\w]+)&?)?(?:(?:firstparam|secondparam|thirdparam)\=([\w]+)&?)?

Whilst regex is not the best solution for this (the above code examples will be far more efficient, as string functions are way faster than regexes) this will work if you need a regex solution with up to 3 parameters. Out of interest, why must the solution use only regex?

In any case, this regex will match the following strings:

/parent/child?firstparam=abc123  
/parent/child?secondparam=def456  
/parent/child?firstparam=abc123&secondparam=def456  
/parent/child?secondparam=def456&firstparam=abc123  
/parent/child?thirdparam=ghi789&secondparam=def456&firstparam=abc123  
/parent/child?secondparam=def456&firstparam=abc123&thirdparam=ghi789  
/parent/child?thirdparam=ghi789  
/parent/child/?firstparam=abc123  
/parent/child/?secondparam=def456  
/parent/child/?firstparam=abc123&secondparam=def456  
/parent/child/?secondparam=def456&firstparam=abc123  
/parent/child/?thirdparam=ghi789&secondparam=def456&firstparam=abc123  
/parent/child/?secondparam=def456&firstparam=abc123&thirdparam=ghi789  
/parent/child/?thirdparam=ghi789

It will now only match those containing query string parameters, and put them into capture groups for you.

What language are you using to process your matches?

If you are using preg_match with PHP, you can get the whole match as well as capture groups in an array with

preg_match($regex, $string, $matches);

Then you can access the whole match with $matches[0] and the rest with $matches[1], $matches[2], etc.

If you want to add additional parameters you'll also need to add them in the regex too, and add additional parts to get your data. For example, if you had

/parent/child/?secondparam=def456&firstparam=abc123&fourthparam=jkl01112&thirdparam=ghi789

The regex will become

\/parent\/child\/?\?(?:(?:firstparam|secondparam|thirdparam|fourthparam)\=([\w]+)&?)(?:(?:firstparam|secondparam|thirdparam|fourthparam)\=([\w]+)&?)?(?:(?:firstparam|secondparam|thirdparam|fourthparam)\=([\w]+)&?)?(?:(?:firstparam|secondparam|thirdparam|fourthparam)\=([\w]+)&?)?

This will become a bit more tedious to maintain as you add more parameters, though.

You can optionally include ^ $ at the start and end if the multi-line flag is enabled. If you also need to match the whole lines without query strings, wrap this whole regex in a non-capture group (including ^ $) and add

|(?:^\/parent\/child\/?\??$)

to the end.

Upvotes: 1

gaussblurinc
gaussblurinc

Reputation: 3682

This script will help you.
First, i check, is there any symbol like ?.
Then, i kill first part of line (left from ?).
Next, i split line by &, where each value splitted by =.

my $r = q"/parent/child  
/parent/child?  
/parent/child?firstparam=abc123  
/parent/child?secondparam=def456  
/parent/child?firstparam=abc123&secondparam=def456  
/parent/child?secondparam=def456&firstparam=abc123  
/parent/child?thirdparam=ghi789&secondparam=def456&firstparam=abc123  
/parent/child?secondparam=def456&firstparam=abc123&thirdparam=ghi789  
/parent/child?thirdparam=ghi789  
/parent/child/  
/parent/child/?  
/parent/child/?firstparam=abc123  
/parent/child/?secondparam=def456  
/parent/child/?firstparam=abc123&secondparam=def456  
/parent/child/?secondparam=def456&firstparam=abc123  
/parent/child/?thirdparam=ghi789&secondparam=def456&firstparam=abc123  
/parent/child/?secondparam=def456&firstparam=abc123&thirdparam=ghi789  
/parent/child/?thirdparam=ghi789";


for my $string(split /\n/, $r){
        if (index($string,'?')!=-1){
            substr($string, 0, index($string,'?')+1,"");
            #say "string = ".$string;
            if (index($string,'=')!=-1){
                my @params = map{$_ = [split /=/, $_];}split/\&/, $string;
                $"="\n";
                say "$_->[0] === $_->[1]" for (@params);
                say "######next########";
                }
            else{
                #print "there is no params!"
            }       

        }
        else{
            #say "there is no params!";
        }       
    }

Upvotes: 0

godspeedlee
godspeedlee

Reputation: 672

My solution:
/(?:\w+/)*(?:(?:\w+)?\?(?:\w+=\w+(?:&\w+=\w+)*)?|\w+|)

Explain:
/(?:\w+/)* match /parent/child/ or /parent/

(?:\w+)?\?(?:\w+=\w+(?:&\w+=\w+)*)? match child?firstparam=abc123 or ?firstparam=abc123 or ?

\w+ match text like child

..|) match nothing(empty)

If you need only query string, pattern would reduce such as:
/(?:\w+/)*(?:\w+)?\?(\w+=\w+(?:&\w+=\w+)*)

If you want to get every parameter from query string, this is a Ruby sample:

re = /\/(?:\w+\/)*(?:\w+)?\?(\w+=\w+(?:&\w+=\w+)*)/
s = '/parent/child?secondparam=def456&firstparam=abc123&thirdparam=ghi789'
if m = s.match(re)
    query_str = m[1] # now, you can 100% trust this string
    query_str.scan(/(\w+)=(\w+)/) do |param,value| #grab parameter
        printf("%s, %s\n", param, value)
    end
end

output

secondparam, def456
firstparam, abc123
thirdparam, ghi789

Upvotes: 0

Qsario
Qsario

Reputation: 1026

You're not escaping the /s in your regex for starters and using {1} for a single repetition of something is unnecessary; you only use those when you want more than one repetition or a range of repetitions.

And part of what you're trying to do is simply not a good use of a regex. I'll show you an easier way to deal with that: you want to use something like split and put the information into a hash that you can check the contents of later. Because you didn't specify a language, I'm just going to use Perl for my example, but every language I know with regexes also has easy access to hashes and something like split, so this should be easy enough to port:

 # I picked an example to show how this works.
 my $route = '/parent/child/?first=123&second=345&third=678';
 my %params;  # I'm going to put those URL parameters in this hash.

 # Perl has a way to let me avoid escaping the /s, but I wanted an example that
 # works in other languages too.
 if ($route =~ m/\/parent\/child\/\?(.*)/) {  # Use the regex for this part
   print "Matched route.\n";
   # But NOT for this part. 
   my $query = $1;  # $1 is a Perl thing.  It contains what (.*) matched above.
   my @items = split '&', $query;  # Each item is something like param=123
   foreach my $item (@items) {
     my ($param, $value) = split '=', $item;
     $params{$param} = $value;  # Put the parameters in a hash for easy access.
     print "$param set to $value \n";
   }
 }

 # Now you can check the parameter values and do whatever you need to with them.
 # And you can add new parameters whenever you want, etc.
 if ($params{'first'} eq '123') {
   # Do whatever
 }

Upvotes: 0

FailedDev
FailedDev

Reputation: 26930

You didn't specify a language so I'll just usre Perl. So basically instead of matching everything, I just matched exactly what I thought you needed. Correct me if I am wrong please.

while ($subject =~ m/(?<==)\w+?(?=&|\W|$)/g) {
    # matched text = $&
}

(?<=        # Assert that the regex below can be matched, with the match ending at this position (positive lookbehind)
   =        # Match the character “=” literally
)
\\w         # Match a single character that is a “word character” (letters, digits, and underscores)
   +?       # Between one and unlimited times, as few times as possible, expanding as needed (lazy)
(?=         # Assert that the regex below can be matched, starting at this position (positive lookahead)
            # Match either the regular expression below (attempting the next alternative only if this one fails)
      &     # Match the character “&” literally
   |        # Or match regular expression number 2 below (attempting the next alternative only if this one fails)
      \\W   # Match a single character that is a “non-word character”
   |        # Or match regular expression number 3 below (the entire group fails if this one fails to match)
      \$    # Assert position at the end of the string (or before the line break at the end of the string, if any)
)

Output:

Results

Upvotes: 2

Related Questions