Reputation: 11
I am new to Regualr Expressions, and I am just not getting the hang of it yet.
I have grabbed html content from a given webpage using CURL and PHP. This webpage never changes its structure. The results on the page are dependant on a search function, but the html tags are always the same. I need to grab the resulting data from the page depending on what search terms were entered.
The data I need is:
<h1 class="location_only">(555) 555-5555 is a Landline</h1>
So I need to grab whatever is inbetween
<h1 class="location_only">
and </h1>
If I have $data
, which is the resulting HTML, how do I put that into a regular expression and echo the data I find as $result
?
Upvotes: 0
Views: 192
Reputation: 785058
You've been cautioned enough to not to use regex to parse HTML. So here is a DOM parser based code to extract your value:
$html = <<< EOF
<html>
<head>
<title>Some Title</title>
</head>
<body>
<H1 class="location_only">(555) 555-5555 is a Landline</H1>
</body>
</html>
EOF;
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$xpath = new DOMXPath($doc);
$value = $xpath->evaluate("string(//h1[@class='location_only']/text())");
echo "Your H1 Value=[$value]\n"; // prints text between <h1> and </h1>
OUTPUT:
Your H1 Value=[(555) 555-5555 is a Landline]
Upvotes: 0
Reputation: 2167
You can select text between tags with this search pattern:
<span id="result1">(.*?)</span>
Capture group returns "(555) 555-5555 is a Landline" if your code is: <span id="result1">(555) 555-5555 is a Landline</span>
.
See preg_match() for further info how to echo the result.
Also look into HTML DOM Parser like suggested by others. Maybe I shouldn't have answered at all...
Upvotes: 1
Reputation: 2040
Both 2 answers telling you not to Regex and instead use a DOM parser are correct, however, if the structure of the page doesn't change, a quick & dirty regex will do the trick just fine, given that you have absolutely well placed start and ending point for reference.
Upvotes: 0
Reputation: 24661
Please do not use regular expressions to parse HTML.
Please use an HTML Parser, such as Simple HTML DOM Parser. Your problem may seem localized, but it is not. Even if it was, there is a great affinity for problems of this type to grow in scope at a later date which will cause you a massive headache even if you could get it to work with Regular expressions.
Upvotes: 2
Reputation: 26380
You can't reliably extract information from HTML with a regex. You can, however, use an HTML parser, like DOMDocument::LoadHTML. This will take your HTML from a string and then you can use functions like getElementById or getElementByTagName to find your values. There are other HTML parsers out there as well.
Upvotes: 0