Reputation: 149
I'm have trouble with a regular expression while I'm trying to capture some data in this HTML:
<ul>
<li>Nombre de mots à traduire : 41 mots.</li>
<li>Nombre de mots partiellement traduits : 164 mots.</li>
<li>Nombre de mots traduits : 792 mots.</li>
<li>Nombre de correspondances exactes : 808 mots.</li>
<li>Nombre de répétitions internes : 71 mots.</li>
<li>Total : 1876 mots.</li>
</ul>
I need to get the quantity of 'mots' for every <li>
in PHP Regex, but the :
it's glued to the number and I can't get it.
I'm trying to use on the first one (?<=\btraduire : \s)(\w+)
but it doesn't make sense... I can't modify the HTML in any way, and I can't use html_entity_decode()
.
This HTML changes dynamically, I mean the length of this numbers will change, it's just one example.
Any thoughts?
EDIT:
Okay with (\d+)\smots
I can get it!! =D But if I have:
<p>
Langue source : FRA (FRA)<br/>
Langue cible : ESP (ESP)
</p>
And I want to get the "FRA (FRA)" and "ESP (ESP)", any idea?
Upvotes: 0
Views: 226
Reputation: 55720
If you need the quantity of mots
for each <li>
you should probably use a Regex like this:
(\d+)\smots
But note however that if you're trying to parse HTML you're probably better off using an HTML parser as regular expressions have a hard time with non-regular syntax (i.e. HTML, XML)
UPDATE
For your second query, I would try something like this:
Langue.*([A-Z]{3})\s\(\1\)
In the above, the first capture group should be the language. The \1
in the last part of the regex refers to the first capture group which means that FRA (FRA)
would match, but FRA (BLA)
would not.
Upvotes: 1
Reputation: 89557
You can use this:
preg_match_all('~[0-9]+(?= mots.</li>)~', $html, $matches);
print_r($matches);
or more explicit:
preg_match_all('~(?J)<li>(?:Nombre de (?<what>[^&]++)|(?<what>Total))[^0-9]+(?<quantity>[0-9]+)[^<]*</li>~i', $html, $matches, PREG_SET_ORDER);
print_r($matches);
For your edit:
preg_match_all('~Langue (?<target>[^&\s]++);: \s*(?<language>[^\r\n<]+)\s*~i', $html, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
printf("\n%s\t%s", $match['target'], $match['language']);
}
Upvotes: 1