Unable to get proper output from regex

Question

Here's my regex code :

Name:<\/h5>.*?(.*?)(



Here's the content :

Name:

Josh Taguibao
Click to view Profile


I am able to get my output, which is

Josh Taguibao


However, if the content changes with something like this :

Name:

Josh Taguibao
Click to view Profile


I will only be able to get Josh instead of the whole name.

May I ask on what to add on my code?

mickmackusa · Accepted Answer

If you don't want to use an html parser (which the SO community strongly urges at every chance), you can just match and strip the tags:

Code: (PHP Demo) (Pattern Demo)

$string='Name:

Josh Taguibao
Click to view Profile';

echo preg_match('~Name:.*?\s*\K.*?(?=\s*


Output:
Josh Taguibao

*Notes:

~ is used as the pattern delimiter so that the /s in the pattern don't need to be escaped.
\K in the pattern means: "start the fullstring match from here"
(?=...) is a positive lookahead, which is used to halt the fullstring match before matching a newline followed by  or |  (normally I would write (?=\s(?:|\|)) but it was actually fewer steps the verbose way)

The s modifier/flag at the end of the pattern permits the . (dots) to additionally match new lines.


Now, DomDocument is not my strong suit, but I slapped together this snippet that will work on your sample text. (DomDocument Demo)
$html='Name:

Josh Taguibao
Click to view Profile';

$dom=new DOMDocument; 
$dom->loadHTML($html); 
$name=$dom->getElementsByTagName('div')->item(0)->nodeValue; // or ->textContent
echo trim($name);
// same output as regex method

nodeValue and textContent are effectively the same (for this case anyhow) in that they both return the tag-free text from the div element.

Manual says: textContent The text content of this node and its descendants.


Or if you need to isolate the first occurring element which has the class info-name, then you can use XPath: (Demo)
$dom = new DOMDocument();
$dom->loadHTML($html);
var_export(
    trim(
        (new DOMXPath($dom))
        ->query('//*[@class="info-name"]')
        ->item(0)
        ->nodeValue
    )
);

Unable to get proper output from regex

Answers (2)

Related Questions