Reputation: 11

How to use Regex for Static HTML code (PHP)

I am new to Regualr Expressions, and I am just not getting the hang of it yet.

I have grabbed html content from a given webpage using CURL and PHP. This webpage never changes its structure. The results on the page are dependant on a search function, but the html tags are always the same. I need to grab the resulting data from the page depending on what search terms were entered.

The data I need is:

<h1 class="location_only">(555) 555-5555 is a Landline</h1>

So I need to grab whatever is inbetween

<h1 class="location_only"> and </h1>

If I have $data, which is the resulting HTML, how do I put that into a regular expression and echo the data I find as $result?

Upvotes: 0

Answers (5)

anubhava

Reputation: 785058

You've been cautioned enough to not to use regex to parse HTML. So here is a DOM parser based code to extract your value:

$html = <<< EOF
<html>
<head>
<title>Some Title</title>
</head>
<body>
<H1 class="location_only">(555) 555-5555 is a Landline</H1>
</body>
</html>
EOF;
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$xpath = new DOMXPath($doc);
$value = $xpath->evaluate("string(//h1[@class='location_only']/text())"); 
echo "Your H1 Value=[$value]\n"; // prints text between <h1> and </h1>

OUTPUT:

Your H1 Value=[(555) 555-5555 is a Landline]

Upvotes: 0

ZZ-bb

Reputation: 2167

You can select text between tags with this search pattern:

<span id="result1">(.*?)</span>

Capture group returns "(555) 555-5555 is a Landline" if your code is: <span id="result1">(555) 555-5555 is a Landline</span>.

See preg_match() for further info how to echo the result.

Also look into HTML DOM Parser like suggested by others. Maybe I shouldn't have answered at all...

Upvotes: 1

red

Reputation: 2040

Both 2 answers telling you not to Regex and instead use a DOM parser are correct, however, if the structure of the page doesn't change, a quick & dirty regex will do the trick just fine, given that you have absolutely well placed start and ending point for reference.

Upvotes: 0

Jeff Lambert

Reputation: 24661

Please do not use regular expressions to parse HTML.

Please use an HTML Parser, such as Simple HTML DOM Parser. Your problem may seem localized, but it is not. Even if it was, there is a great affinity for problems of this type to grow in scope at a later date which will cause you a massive headache even if you could get it to work with Regular expressions.

Upvotes: 2

Surreal Dreams

Reputation: 26380

You can't reliably extract information from HTML with a regex. You can, however, use an HTML parser, like DOMDocument::LoadHTML. This will take your HTML from a string and then you can use functions like getElementById or getElementByTagName to find your values. There are other HTML parsers out there as well.

Upvotes: 0

How to use Regex for Static HTML code (PHP)

Answers (5)

Related Questions