Reputation: 138

preg_match_all for special characters [?]

I have a URL:

https://my.site.com/u/0/ac?export=download&amp;confirm=45vy&amp;id=qNhdhk1jejhXLexLpY3RiDY2oamis">D

And I want to match it using preg_match_all. My regex expression is:

preg_match_all('/(https:\/\/my\.site\.com\/[u]\/[0]\/(ac)\/(?)\/.*\">D)/', $input_lines, $output_array);

But I am not able to match special character ? in above code. I tried using (?). But it is not matching. I know it maybe a lame question, but if anyone could help me in matching ? or in escaping ? in preg_match_all, that would be helpfull.

Upvotes: 2

Answers (2)

Steven

Reputation: 6148

Your regex

/(https:\/\/my\.site\.com\/[u]\/[0]\/(ac)\/(?)\/.*\">D)/
^                           ^    ^    ^   ^ ^    ^     ^
1                           2    2    3   4 5    6     1
+-- Starting delimiter      |    |    |   | |    |     +-- Ending delimiter
                            |    |    |   | |    +-- This is a greedy match and may not stop where intended
                            |    |    |   | +-- `?` is a special character in Regex and does nothing in this scenario; the .* is actually matching the `?`
                            |    |    |   +-- This slash doesn't exist
                            |    |    +-- No need for a capture group
                            +----+-- No need for a character set

Regular expression pattern delimiters:
- ...mark the start and end of a pattern; similar to single/double quotes marking the start and end of strings
- As with quotes if you use the delimiter in the pattern you have to escape it
- To avoid escaping you can use a different delimiter
```
Pattern 1: /https:\/\/www\.website\.com\/page\/1\/\index.php/

Pattern 2: ~https://www\.website\.com/page/1/index\.php~
```

2.As you just want to match characters literally you can simply use the characters in the pattern. You would only need a character set if the character could be multiple values

   Set       Matched value
   u    ===> u
   [u]  ===> u
   [ua] ===> u OR a

Like with 2 you don't need a capture group here because you're only interested in capturing the whole string. This would add $output_array[1] = "ac" to your output
For some reason you're trying to match a / that doesn't exist in the URL so the pattern will never return anything
The ? is a special character in regex; typically it is used at the start of a group (a), to modify a quantifier (b), or to imply a construct is optional (c). In this case (?) does absolutely nothing; the .* matches the literal ? or would do if the slash wasn't in the pattern.

a. Used in a group the ? can mean, for example:
```
   (?:...) ===> Non-capturing group
   (?=...) ===> Positive lookahead
   (?!...) ===> Negative lookahead
```
b. To modify a quantifier: usually a quantifier + or * would be greedy and matches as much as possible. Placing a ? after it makes it non-greedy and stops at the first possibility
```
String: IIIIOIIIOIIIO

Pattern       Match

/I.*O/        IIIIOIIIOIIIO
/I.*?O/       IIIIO
```
c. To make a construct optional
```
Pattern                  Match 1             Match 2             Explanation

~https?://~              http://             https://            Optional character
~(?:www\.)?website.com~  website.com         www.website.com     Optional non-capturing group
```

As per 5b this is a greedy quantifier so, for example, if the pattern \">D was to appear more than once in a string this would match until the last occurrence.

i.e. if there were more than one URL in your string then it would match from the first until the last as opposed to matching them individually

String: <a href="website.com?id=2432546t4534">Link 1</a><a href="website.com?id=24345yr6787">Link 2</a>

Pattern                    Matches

~website.com\?id=.*">~     [1] website.com?id=2432546t4534">Link 1</a><a href="website.com?id=24345yr6787">

~website.com\?id=.*?">~    [1] website.com?id=2432546t4534">
                           [2] website.com?id=24345yr6787">

Fix

Updated Regex

~https://my\.site\.com/u/0/ac\?.*?">D~
~                                      : Starting delimiter
 https://my\.site\.com/u/0/ac          : Matches the initial part of the URL
                             \?        : Matches a literal ?
                               .*?     : Non-greedy match any character 0 or more times
                                  ">D  : Match string literally
                                     ~ : Ending delimiter

Code

$input_lines  = 'https://my.site.com/u/0/ac?export=download&amp;confirm=45vy&amp;id=qNhdhk1jejhXLexLpY3RiDY2oamis">D';

preg_match_all('~https://my\.site\.com/u/0/ac\?.*?">D~', $input_lines, $output_array);

print_r($output_array);

Output

Array
(
    [0] => Array
        (
            [0] => https://my.site.com/u/0/ac?export=download&confirm=45vy&id=qNhdhk1jejhXLexLpY3RiDY2oamis">D
        )

)

Upvotes: 2

Akhilesh

Reputation: 968

I just noticed that after ac there is not / in link but you are adding that in regex so just try to remove it or use the below code its working and tested.

<?php

$input_lines = 'https://my.site.com/u/0/ac?export=download&amp;confirm=45vy&amp;id=qNhdhk1jejhXLexLpY3RiDY2oamis">D';
preg_match_all('/(https:\/\/my\.site\.com\/[u]\/[0]\/(ac)(\?).*\">D)/', $input_lines, $output_array);

var_dump($output_array);

This is output - https://prnt.sc/weq86u

Or if there are chances that after ac/? can occur then you can try using / as optional parameter in regex

<?php

$input_lines = 'https://my.site.com/u/0/ac?export=download&amp;confirm=45vy&amp;id=qNhdhk1jejhXLexLpY3RiDY2oamis">D';
preg_match_all('/(https:\/\/my\.site\.com\/[u]\/[0]\/(ac)\/?(\?).*\">D)/', $input_lines, $output_array);

var_dump($output_array);

It will match both links with or without / https://prnt.sc/weqbae

Upvotes: 3

preg_match_all for special characters [?]

Answers (2)

Your regex

Fix

Related Questions