Reputation: 2866
Original format:
<a href="http://www.example.com/t434234.html" ...>
1. I need to fetch all URLs of this format:
http://www.example.com/t[ANY CHARACTER].html
ANY CHARACTER is where value changes from URL to another. The rest are fixed.
Here is my attempt:
preg_match("#http:\/\/www\.aqarcity\.com\/t[a-zA-Z0-9_]\.html#", $page, $urls);
I get empty results. I don't know where i went wrong...
Upvotes: 0
Views: 222
Reputation: 17861
The problem appears to be that [a-zA-Z0-9_]
will only match exactly one character. If you want to match zero or more characters, use [a-zA-Z0-9_]*
. For one or more, use [a-zA-Z0-9_]+
. For exactly six characters, use [a-zA-Z0-9_]{6}
. For e.g. one to six characters, use [a-zA-Z0-9_]{1,6}
.
Also note that, since you're using #
as the delimiter, you don't need to escape the /
characters. As far as I know this will not make your code misbehave, but it'll be easier to read if you remove the backslashes before the slashes.
Finally, please realize that regular expressions are a rather dangerous way to work with HTML. In this case, you may pick up matching URLs from comments, Javascript code, and other things that aren't links. It is literally impossible to correctly parse HTML with unaugmented regular expressions—they don't have the expressive power necessary to do so. I don't know what sorts of HTML parsers are available for PHP, but you may want to look into them.
Upvotes: 1