Reputation: 138
I have a URL:
https://my.site.com/u/0/ac?export=download&confirm=45vy&id=qNhdhk1jejhXLexLpY3RiDY2oamis">D
And I want to match it using preg_match_all. My regex expression is:
preg_match_all('/(https:\/\/my\.site\.com\/[u]\/[0]\/(ac)\/(?)\/.*\">D)/', $input_lines, $output_array);
But I am not able to match special character ?
in above code. I tried using (?)
. But it is not matching. I know it maybe a lame question, but if anyone could help me in matching ?
or in escaping ?
in preg_match_all, that would be helpfull.
Upvotes: 2
Views: 381
Reputation: 6148
/(https:\/\/my\.site\.com\/[u]\/[0]\/(ac)\/(?)\/.*\">D)/
^ ^ ^ ^ ^ ^ ^ ^
1 2 2 3 4 5 6 1
+-- Starting delimiter | | | | | | +-- Ending delimiter
| | | | | +-- This is a greedy match and may not stop where intended
| | | | +-- `?` is a special character in Regex and does nothing in this scenario; the .* is actually matching the `?`
| | | +-- This slash doesn't exist
| | +-- No need for a capture group
+----+-- No need for a character set
Regular expression pattern delimiters:
...mark the start and end of a pattern; similar to single/double quotes marking the start and end of strings
As with quotes if you use the delimiter in the pattern you have to escape it
To avoid escaping you can use a different delimiter
Pattern 1: /https:\/\/www\.website\.com\/page\/1\/\index.php/
Pattern 2: ~https://www\.website\.com/page/1/index\.php~
2.As you just want to match characters literally you can simply use the characters in the pattern. You would only need a character set if the character could be multiple values
Set Matched value
u ===> u
[u] ===> u
[ua] ===> u OR a
Like with 2
you don't need a capture group here because you're only interested in capturing the whole string. This would add $output_array[1] = "ac"
to your output
For some reason you're trying to match a /
that doesn't exist in the URL so the pattern will never return anything
The ?
is a special character in regex; typically it is used at the start of a group (a
), to modify a quantifier (b
), or to imply a construct is optional (c
). In this case (?)
does absolutely nothing; the .*
matches the literal ?
or would do if the slash wasn't in the pattern.
a. Used in a group the ?
can mean, for example:
(?:...) ===> Non-capturing group
(?=...) ===> Positive lookahead
(?!...) ===> Negative lookahead
b. To modify a quantifier: usually a quantifier +
or *
would be greedy and matches as much as possible. Placing a ?
after it makes it non-greedy and stops at the first possibility
String: IIIIOIIIOIIIO
Pattern Match
/I.*O/ IIIIOIIIOIIIO
/I.*?O/ IIIIO
c. To make a construct optional
Pattern Match 1 Match 2 Explanation
~https?://~ http:// https:// Optional character
~(?:www\.)?website.com~ website.com www.website.com Optional non-capturing group
As per 5b
this is a greedy quantifier so, for example, if the pattern \">D
was to appear more than once in a string this would match until the last occurrence.
i.e. if there were more than one URL in your string then it would match from the first until the last as opposed to matching them individually
String: <a href="website.com?id=2432546t4534">Link 1</a><a href="website.com?id=24345yr6787">Link 2</a>
Pattern Matches
~website.com\?id=.*">~ [1] website.com?id=2432546t4534">Link 1</a><a href="website.com?id=24345yr6787">
~website.com\?id=.*?">~ [1] website.com?id=2432546t4534">
[2] website.com?id=24345yr6787">
Updated Regex
~https://my\.site\.com/u/0/ac\?.*?">D~
~ : Starting delimiter
https://my\.site\.com/u/0/ac : Matches the initial part of the URL
\? : Matches a literal ?
.*? : Non-greedy match any character 0 or more times
">D : Match string literally
~ : Ending delimiter
Code
$input_lines = 'https://my.site.com/u/0/ac?export=download&confirm=45vy&id=qNhdhk1jejhXLexLpY3RiDY2oamis">D';
preg_match_all('~https://my\.site\.com/u/0/ac\?.*?">D~', $input_lines, $output_array);
print_r($output_array);
Output
Array
(
[0] => Array
(
[0] => https://my.site.com/u/0/ac?export=download&confirm=45vy&id=qNhdhk1jejhXLexLpY3RiDY2oamis">D
)
)
Upvotes: 2
Reputation: 968
I just noticed that after ac there is not /
in link but you are adding that in regex so just try to remove it or use the below code its working and tested.
<?php
$input_lines = 'https://my.site.com/u/0/ac?export=download&confirm=45vy&id=qNhdhk1jejhXLexLpY3RiDY2oamis">D';
preg_match_all('/(https:\/\/my\.site\.com\/[u]\/[0]\/(ac)(\?).*\">D)/', $input_lines, $output_array);
var_dump($output_array);
This is output - https://prnt.sc/weq86u
Or if there are chances that after ac/?
can occur then you can try using /
as optional parameter in regex
<?php
$input_lines = 'https://my.site.com/u/0/ac?export=download&confirm=45vy&id=qNhdhk1jejhXLexLpY3RiDY2oamis">D';
preg_match_all('/(https:\/\/my\.site\.com\/[u]\/[0]\/(ac)\/?(\?).*\">D)/', $input_lines, $output_array);
var_dump($output_array);
It will match both links with or without /
https://prnt.sc/weqbae
Upvotes: 3