Reputation: 280

Split_on_title() and preg_replace() on a data pulled from another website with file_get_contents()

I've pulled data from another website using file_get_contents().

This is the part of source code:

<font style="font-size:10px;color:#123333;font-weight:BOLD;">1,22 €</font>

I used split_on_title function to pull 1,22 € from the string:

$split_on_title = preg_split("<font style=\"font-size:10px;color:#123333;font-weight:BOLD;\">", $source);
$split_on_endtitle = preg_split("</font>", $split_on_title[1]);
$title = $split_on_endtitle[0];

And when I echo $title, firefox returns:

>1,22 â‚¬<

And i used preg_replace on the string:

preg_replace('> â‚¬<', '', $title);

Then, php shows this error: Warning: preg_replace(): No ending delimiter '>' found in....

How can i pull the clean value of 1,22 €? At least only 1,22. Thanks in advance.

EDIT:

Understood that it is difficult with the data i gave. I will write a bigger data;

<tr>
    <td width="80" align="left" valign="top">
        <b> Price:</b>
    </td>
    <td align="left"  valign="top">
        <font style="font-size:10px;color:#123333;font-weight:BOLD;">1,22 €</font>
    </td>
</tr>

I need help to pull 1,22 € from this source.

Upvotes: 2

Answers (3)

Ozan Atmar

Reputation: 280

The answer of @pavlovich gave me an output of >1,22 €<. And i used;

$title = ltrim($title, '>'); 
$title = rtrim($title, '<');

to remove the tags.

I know this is not the right way to do it. But solved my problem.

Upvotes: 1

slapyo

Reputation: 2991

Why not use preg_match and grab everything between the font tag?

$re = "/<font.*>(.*)<\\/font>/i"; 
$str = "<font style=\"font-size:10px;color:#123333;font-weight:BOLD;\">1,22 €</font>"; 

preg_match($re, $str, $matches);
echo $matches[1];

Here's how the pattern breaks down.

<font matches the characters <font literally (case insensitive)
.* matches any character (except newline)
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
> matches the characters > literally
1st Capturing group (.*)
.* matches any character (except newline)
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
< matches the characters < literally
\/ matches the character / literally
font> matches the characters font> literally (case insensitive)
i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])

Upvotes: 1

pavlovich

Reputation: 1942

Please add required support for UTF-8 in the <head> section of your html page

<meta charset="UTF-8" />

It is missing, therefore euro sign is not rendered properly

More details on how to put in this and other meta tags: http://www.w3schools.com/tags/tag_meta.asp

Upvotes: 1

Split_on_title() and preg_replace() on a data pulled from another website with file_get_contents()

Answers (3)

Related Questions