Abs
Abs

Reputation: 57966

Regex to match html attributes

I am trying to match a pattern so that I can retrieve a string from a website. Here is the string in Question:

<a title="Posts by ivek dhwWaVa"
href="http://www.example.com/author/ivek/"
rel="nofollow">ivek</a>

I am trying to match the string "ivek" in between the a tag and I want to do this for each post and relate it to the number of comments.

Firstly, what is the regex I should use the above so I can use it as an example for the rest. I have nothing so far:

$content = file_get_contents('http://www.example.com');
preg_match_all("", $content, $matches);

And how I would relate the comments to the authors name as there are many other authors on the website and also their own set of comments. Do I use divs to break this up? As each set of info is wrapped around this div:

<div id="post-54" class="excerpt">

Thanks all for any help!

Upvotes: 2

Views: 1609

Answers (2)

zombat
zombat

Reputation: 94237

Please let me be the first to introduce you to the most famous answer on Stack Overflow.

Regular expressions are not suited to parsing HTML. You really need an HTML parser, even for what might appear to be a simple task.

I recommend something like PHP Simple HTML DOM Parser.

Upvotes: 5

Related Questions