raygo
raygo

Reputation: 1398

Get content from html file

I have a list of html files. Each file repeatedly has the strings onClick="rpd(SOME_NUMBER)" . I know how to get the content from the html files, what I would want to do is get a list of the "SOME_NUMBER" . I saw that I might need to do a preg_match, but I'm horrible at regular expressions. I tried

$file_content = file_get_contents($url);    
$pattern= 'onClick="rpd(#);"';
preg_match($pattern, $file_content);

As you could imagine... it didn't work. What would be the best way to get this done? Thanks!

Upvotes: 0

Views: 173

Answers (5)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89547

A clean way to do this is to use DOMDocument and XPath:

$doc = new DOMDocument();
@$doc->loadHTMLFile($url);
$xpath = new DOMXPath($doc);
$ress= $xpath->query("//*[contains(@onclick,'rpd(')]/attribute::onclick");
foreach ($ress as $res) {
    echo substr($res->value,4,-1) . "\n";    
}

Upvotes: 0

sdanzig
sdanzig

Reputation: 4500

$file_content='blah blah onClick="rpd(56)"; blah blah\nblah blah onClick="rpd(43)"; blah blah\nblah blah onClick="rpd(11)"; blah blah\n';
$pattern= '/onClick="rpd\((\d+)\)";/';
preg_match_all($pattern, $file_content, $matches);
print_r($matches);

That outputs:

Array
(
    [0] => Array
        (
            [0] => onClick="rpd(56)";
            [1] => onClick="rpd(43)";
            [2] => onClick="rpd(11)";
        )

    [1] => Array
        (
            [0] => 56
            [1] => 43
            [2] => 11
        )

)

You can play around with my example here: http://ideone.com/TzShPG

Upvotes: 0

entrapeneur
entrapeneur

Reputation: 36

This should get it done:

    $file_content ='234=fdf donClick="rpd(5);"as23 f2 onClick="rpd(7);" dff fonClick="rpd(8);"';    
    $pattern= '/onClick="rpd\((\d+)\);"/';

    preg_match_all($pattern, $file_content,$matches);
    var_dump( $matches);

The output is like this:


    array (size=2)
    0 => 
    array (size=3)
      0 => string 'onClick="rpd(5);"' (length=17)
      1 => string 'onClick="rpd(7);"' (length=17)
      2 => string 'onClick="rpd(8);"' (length=17)
    1 => 
    array (size=3)
      0 => string '5' (length=1)
      1 => string '7' (length=1)
      2 => string '8' (length=1)

Upvotes: 1

Lajos Veres
Lajos Veres

Reputation: 13725

Maybe something like this?

preg_match('/onClick="rpd\((\d+)\);"/', $file_content,$matches);
print $matches[1];

Upvotes: 1

utdemir
utdemir

Reputation: 27216

I don't know PHP, but the regular expression to match that would be:

'onClick="rpd\(([0-9]+)\)"'

Note that we need to escape those paranthesis with \ because of their special meaning, also we surrounded our match with one regular paranthesis for seperating digits.

If preg_match also supports lookahead/lookbehind expressions:

 '(?<=onClick="rpd\()[0-9]+(?=\)")'

will also work.

Upvotes: 0

Related Questions