PHP - Finding words in HTML tags

Question

Looking for the best way to get the content of some HTML text in some random pieces of HTML

I cannot seem to figure out the regex for it.


    Dec 05, 2015 23:16:52
    rron7pam has won



    
        
            Attacker:
            Bliksem

The above are only examples, but for these examples, I am interested in

Getting the date (date = Dec 05, 2015 23:16:52)
Who won the battle (name = rron7pam)
The name of the attacker (name = Bliksem)
Attacker's ID (id = 255995)

There are lots more information that I need from separate HTML code pieces, but if I can get one or two right, I might be able to get some more.

EDIT based on comments and answers: There could be any arbitrary text in the HTML, depending on how the report was set up (to hide attacker's units, etc.) I need to look for patterns of specific HTML tags

In the example above, "The text between the

tags directly following a set of tags inside a " will be the date that I need.

Some examples of links with different formats:

https://enp2.tribalwars.net/public_report/70d3a2a55461e9eb09f543958b608304 https://enp2.tribalwars.net/public_report/5216e0e16c9d3657f981ce7e3cb02580

There are elements that will always be the same, as far as I can tell, e.g., as per the above to get the date.

Casimir et Hippolyte · Accepted Answer

An example with DOMDocument:

$url = 'https://enp2.tribalwars.net/public_report/70d3a2a55461e9eb09f543958b608304';

// prevent warnings to be displayed
libxml_use_internal_errors(true);

$dom = new DOMDocument;
$dom->loadHTMLFile($url);

$xp = new DOMXPath($dom);

# lets find interesting nodes:

// td that contains all the needed informations (the nearest common ancestor in other words)
$rootNode = $xp->query('(//table[@class="vis"]/tr/td[./h4])[1]')->item(0);

// first h4 node that contains the date
$dateNode = $xp->query('(./h4)[1]', $rootNode)->item(0);

// following h3 node that contains the player name
$winnerNode = $xp->query('(./following-sibling::h3)[1]', $dateNode)->item(0);

$attackerNode = $xp->query('(./table[@id="attack_info_att"]/tr/th/a)[1]', $rootNode)->item(0);

# extract special values

$winner = preg_replace('~ has won$~', '', $winnerNode->nodeValue);

$attackerID = html_entity_decode($attackerNode->getAttribute('href'));
$attackerID = parse_url($attackerID, PHP_URL_QUERY);
parse_str($attackerID, $queryVars);
$attackerID = $queryVars['id'];

$result = [ 'date' => $dateNode->nodeValue,
            'winner' => $winner,
            'attacker' => $attackerNode->nodeValue,
            'attackerID' => $attackerID ];

print_r($result);

PHP - Finding words in HTML tags

Answers (2)

Related Questions