Zera42
Zera42

Reputation: 2692

How to parse html tags multiple times? PHP

String I'm trying to parse.

<b>Genre:</b> <a href="http://store.steampowered.com/genre/Action/?snr=1_5_9__408">Action</a>, <a href="http://store.steampowered.com/genre/Adventure/?snr=1_5_9__408">Adventure</a>, <a href="http://store.steampowered.com/genre/Casual/?snr=1_5_9__408">Casual</a>, <a href="http://store.steampowered.com/genre/Early%20Access/?snr=1_5_9__408">Early Access</a>, <a href="http://store.steampowered.com/genre/Indie/?snr=1_5_9__408">Indie</a>, <a href="http://store.steampowered.com/genre/RPG/?snr=1_5_9__408">RPG</a><br>

What I'm trying to achieve (without all the other tags etc):

Action Adventure Casual Early Access Indie RPG

Here's what I've tried

        function getTagInfo($content,$start,$end){
            $r = explode($start, $content);
            if (isset($r[1])){
                $r = explode($end, $r[1]);
                return $r[0];
            }
            return '0';
        }


 getTagInfo($html, '/?snr=1_5_9__408">', '</a>');

and that only gives me one genre, I can't think of an algorithm to be able to parse the rest also, so how would I be able to parse the other lines?

Upvotes: 2

Views: 216

Answers (5)

Quixrick
Quixrick

Reputation: 3200

I would probably do this with REGEX also, but since there are already 4 posts with REGEX answers, I'll throw another idea out there. This may be overly simple, but you can use strip_tags to remove any HTML tags.

$string = '<b>Genre:</b> <a href="http://store.steampowered.com/genre/Action/?snr=1_5_9__408">Action</a>, <a href="http://store.steampowered.com/genre/Adventure/?snr=1_5_9__408">Adventure</a>, <a href="http://store.steampowered.com/genre/Casual/?snr=1_5_9__408">Casual</a>, <a href="http://store.steampowered.com/genre/Early%20Access/?snr=1_5_9__408">Early Access</a>, <a href="http://store.steampowered.com/genre/Indie/?snr=1_5_9__408">Indie</a>, <a href="http://store.steampowered.com/genre/RPG/?snr=1_5_9__408">RPG</a><br>';

print strip_tags($string);

This will return the following:

Genre: Action, Adventure, Casual, Early Access, Indie, RPG

Anyway, it's probably not how I'd go about doing it, but it's a one-liner that is really easy to implement.

I reckon, you can also turn it into the array you're looking for by combining the preceeding with some REGEX like this:

$string_array = preg_split('/,\s*/', preg_replace('/Genre:\s+/i', '', strip_tags($string)));

print_r($string_array);

That will give you the following:

Array
(
    [0] => Action
    [1] => Adventure
    [2] => Casual
    [3] => Early Access
    [4] => Indie
    [5] => RPG
)

Ha, sorry ... ended up throwing REGEX into the answer anyway. But it's still a one-liner. :)

Upvotes: 0

The Alpha
The Alpha

Reputation: 146191

You may try something like this (DEMO):

function getTagInfo($html)
{
    if( preg_match_all('/<a href=\"(.*?)\">/i', $html, $matches)) {
        $result = array();
        foreach($matches[1] as $href) {
            $array = explode('/', $href);
            $arr = $array[count($array) - 2];
            $result[] = urldecode($arr);
        }
        return $result;
    }
    return false;
}

// Get an array
print_r(getTagInfo($html));

Output:

Array ( 
    [0] => Action 
    [1] => Adventure 
    [2] => Casual 
    [3] => Early Access 
    [4] => Indie 
    [5] => RPG 
)

Upvotes: 1

vvanasten
vvanasten

Reputation: 951

You can use preg_match_all:

$regex = '/<a.*?>(.*?)<\/a>/is';
preg_match_all($regex, $html, $matches);

$matches[1] will then be an array of the contents between the anchor tags and you could iterate over it like this:

foreach ($matches[1] as $match)
{
  echo $match .'<br>';
}

It would probably be better to use an actual HTML parser, as HTML is not regualr syntax.

Upvotes: 1

Aprilsnar
Aprilsnar

Reputation: 535

You can use this code from another stackoverflow thread.

PHP/regex: How to get the string value of HTML tag?

 <?php
function getTextBetweenTags($string, $tagname) {
    $pattern = "/<$tagname ?.*>(.*)<\/$tagname>/";
    preg_match($pattern, $string, $matches);
    return $matches[1];
}

$str = '<textformat leading="2"><p align="left"><font size="10">get me</font></p></textformat>';
$txt = getTextBetweenTags($str, "font");
echo $txt;
?>

Upvotes: 1

Sharikov Vladislav
Sharikov Vladislav

Reputation: 7269

You can use regexp's here:

<a.*?>(.*?)</a>

This RegExp will return all <a></a> contetns.

Try this php code:

preg_match(/<a.*?>(.*?)<\/a>/, $htmlString, $matches);

foreach($matches as $match) {
    echo $match . " <br /> "; 
}

This will output:

Action 
Adventure 
Casual 
Early 
Access 
Indie 
RPG

Upvotes: 1

Related Questions