Muhambi
Muhambi

Reputation: 3522

Issue with Str_Replace

I'm a beginner programmer making a fairly simple scrape-website and storing information in a mysql database privately to learn more about programming.

Here's the code I am trying to scrape:

<li id="liIngredient" data-ingredientid="3914" data-grams="907.2">
                <label>
                    <span class="checkbox-formatted"><input id="cbxIngredient" type="checkbox" name="ctl00$CenterColumnPlaceHolder$recipeTest$recipe$ingredients$rptIngredientsCol1$ctl01$cbxIngredient" /></span>
                    <p class="fl-ing" itemprop="ingredients">
                        <span id="lblIngAmount" class="ingredient-amount">2 pounds</span>
                        <span id="lblIngName" class="ingredient-name">ground beef chuck</span>

                    </p>
                </label>
            </li>

<li id="liIngredient" data-ingredientid="5838" data-grams="454">
                <label>
                    <span class="checkbox-formatted"><input id="cbxIngredient" type="checkbox" name="ctl00$CenterColumnPlaceHolder$recipeTest$recipe$ingredients$rptIngredientsCol1$ctl02$cbxIngredient" /></span>
                    <p class="fl-ing" itemprop="ingredients">
                        <span id="lblIngAmount" class="ingredient-amount">1 pound</span>
                        <span id="lblIngName" class="ingredient-name">bulk Italian sausage</span>

                    </p>
                </label>
            </li>

After scraping the data, I am trying to use str_replace to get rid of everything but the (using the first example) 2 pounds ground beef (or 1 pound bulk Italian sausage in the second example) .

Here's my attempt:

$ingredients = str_replace('#<label>\s<span class="checkbox-formatted"><input id="cbxIngredient" type="checkbox" name=".*?" /></span>\s<p class="fl-ing" itemprop="ingredients">\s#', null, $ingredients);
              echo $ingredients;

Which in theory, should remove everything to the span id=lblIngAmount part. Where am I going wrong? The text stays the same after and before the str_replace. How come?

Thanks for any and all help! If you need any more details, I'll be glad to give them!

Upvotes: 3

Views: 153

Answers (2)

Sylver
Sylver

Reputation: 8967

Don't use regex to parse HTML.

See How to parse HTML.

Regex would work in this specific case, but since this is a learning project, you want to do it right.

Upvotes: 2

kittycat
kittycat

Reputation: 15045

You want to use preg_replace() however you should not really be using regular expression to manipulate HTML. Use PHP's DOMDocument instead.

Upvotes: 2

Related Questions