PHP - Extracting two values from a line

Question

I'm a beginner with regular expressions and am working on a server where I cannot instal anything (does using DOM methods require the instal of anything?).

I have a problem that I cannot solve with my current knowledge. I would like to extract from the line below the album id and image url. There are more lines and other url elements in the string (file), but the album ids and image urls I need are all in strings similar to the one below:

So in this case I would like to get '774' and 'http://img255.imageshack.us/img00/000/000001.png'

I've seen multiple examples of extracting just the url or one other element from a string, but I really need to keep these both together and store these in one record of the database.

Any help is really appreciated!

nickb · Accepted Answer

Since you are new to this, I'll explain that you can use PHP's HTML parser known as DOMDocument to extract what you need. You should not use a regular expression as they are inherently error prone when it comes to parsing HTML, and can easily result in many false positives.

To start, lets say you have your HTML:

$html = '';

And now, we load that into DOMDocument:

$doc = new DOMDocument;
$doc->loadHTML( $html);

Now, we have that HTML loaded, it's time to find the elements that we need. Let's assume that you can encounter other tags within your document, so we want to find those tags that have a direct tag as a child. Then, check to make sure we have the correct nodes, we need to make sure we extract the correct information. So, let's have at it:

$results = array();

// Loop over all of the  tags in the document
foreach( $doc->getElementsByTagName( 'a') as $a) {
    // If there are no children, continue on
    if( !$a->hasChildNodes()) continue;

    // Find the child  tag, if it exists
    foreach( $a->childNodes as $child) {
         if( $child->nodeType == XML_ELEMENT_NODE && $child->tagName == 'img') { 
             // Now we have the  tag in $a and the  tag in $child
             // Get the information we need:
             parse_str( parse_url( $a->getAttribute('href'), PHP_URL_QUERY), $a_params);
             $results[] = array( $a_params['album'], $child->getAttribute('src'));              
         }
    }
}

A print_r( $results); now leaves us with:

Array
(
    [0] => Array
        (
            [0] => 774
            [1] => http://img255.imageshack.us/img00/000/000001.png
        )

)

Note that this omits basic error checking. One thing you can add is in the inner foreach loop, you can check to make sure you successfully parsed an album parameter from the 's href attribute, like so:

if( isset( $a_params['album'])) {
    $results[] = array( $a_params['album'], $child->getAttribute('src'));        
}

Every function I've used in this can be found in the PHP documentation.

PHP - Extracting two values from a line

Answers (2)

Related Questions