Reputation: 3522
I'm a beginner programmer making a fairly simple scrape-website and storing information in a mysql database privately to learn more about programming.
Here's the code I am trying to scrape:
<li id="liIngredient" data-ingredientid="3914" data-grams="907.2">
<label>
<span class="checkbox-formatted"><input id="cbxIngredient" type="checkbox" name="ctl00$CenterColumnPlaceHolder$recipeTest$recipe$ingredients$rptIngredientsCol1$ctl01$cbxIngredient" /></span>
<p class="fl-ing" itemprop="ingredients">
<span id="lblIngAmount" class="ingredient-amount">2 pounds</span>
<span id="lblIngName" class="ingredient-name">ground beef chuck</span>
</p>
</label>
</li>
<li id="liIngredient" data-ingredientid="5838" data-grams="454">
<label>
<span class="checkbox-formatted"><input id="cbxIngredient" type="checkbox" name="ctl00$CenterColumnPlaceHolder$recipeTest$recipe$ingredients$rptIngredientsCol1$ctl02$cbxIngredient" /></span>
<p class="fl-ing" itemprop="ingredients">
<span id="lblIngAmount" class="ingredient-amount">1 pound</span>
<span id="lblIngName" class="ingredient-name">bulk Italian sausage</span>
</p>
</label>
</li>
After scraping the data, I am trying to use str_replace to get rid of everything but the (using the first example) 2 pounds ground beef (or 1 pound bulk Italian sausage in the second example) .
Here's my attempt:
$ingredients = str_replace('#<label>\s<span class="checkbox-formatted"><input id="cbxIngredient" type="checkbox" name=".*?" /></span>\s<p class="fl-ing" itemprop="ingredients">\s#', null, $ingredients);
echo $ingredients;
Which in theory, should remove everything to the span id=lblIngAmount
part. Where am I going wrong? The text stays the same after and before the str_replace. How come?
Thanks for any and all help! If you need any more details, I'll be glad to give them!
Upvotes: 3
Views: 153
Reputation: 8967
Don't use regex to parse HTML.
See How to parse HTML.
Regex would work in this specific case, but since this is a learning project, you want to do it right.
Upvotes: 2
Reputation: 15045
You want to use preg_replace() however you should not really be using regular expression to manipulate HTML. Use PHP's DOMDocument instead.
Upvotes: 2