Reputation: 57966
I am trying to match a pattern so that I can retrieve a string from a website. Here is the string in Question:
<a title="Posts by ivek dhwWaVa"
href="http://www.example.com/author/ivek/"
rel="nofollow">ivek</a>
I am trying to match the string "ivek" in between the a tag and I want to do this for each post and relate it to the number of comments.
Firstly, what is the regex I should use the above so I can use it as an example for the rest. I have nothing so far:
$content = file_get_contents('http://www.example.com');
preg_match_all("", $content, $matches);
And how I would relate the comments to the authors name as there are many other authors on the website and also their own set of comments. Do I use divs to break this up? As each set of info is wrapped around this div:
<div id="post-54" class="excerpt">
Thanks all for any help!
Upvotes: 2
Views: 1609
Reputation: 94237
Please let me be the first to introduce you to the most famous answer on Stack Overflow.
Regular expressions are not suited to parsing HTML. You really need an HTML parser, even for what might appear to be a simple task.
I recommend something like PHP Simple HTML DOM Parser.
Upvotes: 5
Reputation: 11546
You really shouldn't be looking to Regex to do the job:
Can you provide some examples of why it is hard to parse XML and HTML with a regex?
Can you provide an example of parsing HTML with your favorite parser?
Upvotes: 3