Reputation: 2692
String I'm trying to parse.
<b>Genre:</b> <a href="http://store.steampowered.com/genre/Action/?snr=1_5_9__408">Action</a>, <a href="http://store.steampowered.com/genre/Adventure/?snr=1_5_9__408">Adventure</a>, <a href="http://store.steampowered.com/genre/Casual/?snr=1_5_9__408">Casual</a>, <a href="http://store.steampowered.com/genre/Early%20Access/?snr=1_5_9__408">Early Access</a>, <a href="http://store.steampowered.com/genre/Indie/?snr=1_5_9__408">Indie</a>, <a href="http://store.steampowered.com/genre/RPG/?snr=1_5_9__408">RPG</a><br>
What I'm trying to achieve (without all the other tags etc):
Action
Adventure
Casual
Early Access
Indie
RPG
Here's what I've tried
function getTagInfo($content,$start,$end){
$r = explode($start, $content);
if (isset($r[1])){
$r = explode($end, $r[1]);
return $r[0];
}
return '0';
}
getTagInfo($html, '/?snr=1_5_9__408">', '</a>');
and that only gives me one genre, I can't think of an algorithm to be able to parse the rest also, so how would I be able to parse the other lines?
Upvotes: 2
Views: 216
Reputation: 3200
I would probably do this with REGEX also, but since there are already 4 posts with REGEX answers, I'll throw another idea out there. This may be overly simple, but you can use strip_tags
to remove any HTML tags.
$string = '<b>Genre:</b> <a href="http://store.steampowered.com/genre/Action/?snr=1_5_9__408">Action</a>, <a href="http://store.steampowered.com/genre/Adventure/?snr=1_5_9__408">Adventure</a>, <a href="http://store.steampowered.com/genre/Casual/?snr=1_5_9__408">Casual</a>, <a href="http://store.steampowered.com/genre/Early%20Access/?snr=1_5_9__408">Early Access</a>, <a href="http://store.steampowered.com/genre/Indie/?snr=1_5_9__408">Indie</a>, <a href="http://store.steampowered.com/genre/RPG/?snr=1_5_9__408">RPG</a><br>';
print strip_tags($string);
This will return the following:
Genre: Action, Adventure, Casual, Early Access, Indie, RPG
Anyway, it's probably not how I'd go about doing it, but it's a one-liner that is really easy to implement.
I reckon, you can also turn it into the array you're looking for by combining the preceeding with some REGEX like this:
$string_array = preg_split('/,\s*/', preg_replace('/Genre:\s+/i', '', strip_tags($string)));
print_r($string_array);
That will give you the following:
Array
(
[0] => Action
[1] => Adventure
[2] => Casual
[3] => Early Access
[4] => Indie
[5] => RPG
)
Ha, sorry ... ended up throwing REGEX into the answer anyway. But it's still a one-liner. :)
Upvotes: 0
Reputation: 146191
You may try something like this (DEMO):
function getTagInfo($html)
{
if( preg_match_all('/<a href=\"(.*?)\">/i', $html, $matches)) {
$result = array();
foreach($matches[1] as $href) {
$array = explode('/', $href);
$arr = $array[count($array) - 2];
$result[] = urldecode($arr);
}
return $result;
}
return false;
}
// Get an array
print_r(getTagInfo($html));
Output:
Array (
[0] => Action
[1] => Adventure
[2] => Casual
[3] => Early Access
[4] => Indie
[5] => RPG
)
Upvotes: 1
Reputation: 951
You can use preg_match_all
:
$regex = '/<a.*?>(.*?)<\/a>/is';
preg_match_all($regex, $html, $matches);
$matches[1]
will then be an array of the contents between the anchor tags and you could iterate over it like this:
foreach ($matches[1] as $match)
{
echo $match .'<br>';
}
It would probably be better to use an actual HTML parser, as HTML is not regualr syntax.
Upvotes: 1
Reputation: 535
You can use this code from another stackoverflow thread.
PHP/regex: How to get the string value of HTML tag?
<?php
function getTextBetweenTags($string, $tagname) {
$pattern = "/<$tagname ?.*>(.*)<\/$tagname>/";
preg_match($pattern, $string, $matches);
return $matches[1];
}
$str = '<textformat leading="2"><p align="left"><font size="10">get me</font></p></textformat>';
$txt = getTextBetweenTags($str, "font");
echo $txt;
?>
Upvotes: 1
Reputation: 7269
You can use regexp's here:
<a.*?>(.*?)</a>
This RegExp will return all <a></a>
contetns.
Try this php code:
preg_match(/<a.*?>(.*?)<\/a>/, $htmlString, $matches);
foreach($matches as $match) {
echo $match . " <br /> ";
}
This will output:
Action
Adventure
Casual
Early
Access
Indie
RPG
Upvotes: 1