Reputation: 8722
I used python feedparser to get this item["description"] from a mashable feed:
<img alt="9f4397d9c05e474fa54291507ad9c03a" src="http://rack.2.mshcdn.com/media/ZgkyMDE2LzA0LzI2LzM0LzlmNDM5N2Q5YzA1LjMzODI0LmpwZwpwCXRodW1iCTU3NXgzMjMjCmUJanBn/393b8db2/53c/9f4397d9c05e474fa54291507ad9c03a.jpg" />
<div style="float: right; width: 50px;"><a href="http://twitter.com/share?via=Mashable&text=Nail+polish+stockings+are+exactly+what+you+need+for+a+lazy+summer+pedicure&src=http%3A%2F%2Fmashable.com%2F2016%2F04%2F26%2Ftoe-nail-polish-stockings%2F" style="margin: 10px;"><img alt="Feed-tw" border="0" src="http://rack.1.mshcdn.com/assets/feed-tw-f7c0a094d16b7ee7c91a1e50839a8e00.jpg" /></a><a href="http://www.facebook.com/sharer.php?u=http%3A%2F%2Fmashable.com%2F2016%2F04%2F26%2Ftoe-nail-polish-stockings%2F&src=sp" style="margin: 10px;"><img alt="Feed-fb" border="0" src="http://rack.1.mshcdn.com/assets/feed-fb-c0a21e8841794479b8086c32c6f24ba1.jpg" /></a></div>
<div>
<p>Say goodbye messy pedicures and hello to finally feeling the sweet freedom of open toed shoes in summer.</p>
<p>Japanese fashion company <a href="http://www.bellemaison.jp/cpg/fashion/fakenail/fakenail_index.html">Belle Maison</a> has a time saving solution for those of us out there who have little time and little hand coordination for painting our toenails — thin stockings with pre-painted toenails.</p>
<div>
<p>SEE ALSO: <a href="http://mashable.com/2016/02/23/weiner-dog-ear-plugs/">Weiner dog ear plugs will help you sleep deeper than a newborn pup</a></p>
</div>
<figure>
<p><img class="" src="http://rack.1.mshcdn.com/media/ZgkyMDE2LzA0LzI2L2M1L3RvZW5haWxhcnRwLjI4NjBiLmpwZwpwCXRodW1iCTU3NXg0MDk2Pg/4f07495a/b32/toe-nail-art-polish-stockings-japan-10.jpg" /></p>
<div>
<p>Image: belle maison</p>
</div>
</figure>
<p>If you're worried about looking a little out-of-date with the classic stockings and open-toed heels that your grandma used to wear, don't fret. The stockings are designed to fit individual toes, giving your pedicure a better fit as well. <a href="http://mashable.com/2016/04/26/toe-nail-polish-stockings/">Read more...</a></p>
</div>
More about <a href="http://mashable.com/conversations/?utm_campaign=Mash-Prod-RSS-Feedburner-All-Partial&utm_cid=Mash-Prod-RSS-Feedburner-All-Partial">Conversations</a>, <a href="http://mashable.com/pics/?utm_campaign=Mash-Prod-RSS-Feedburner-All-Partial&utm_cid=Mash-Prod-RSS-Feedburner-All-Partial">Pics</a>, <a href="http://mashable.com/category/products/?utm_campaign=Mash-Prod-RSS-Feedburner-All-Partial&utm_cid=Mash-Prod-RSS-Feedburner-All-Partial">Products</a>, <a href="http://mashable.com/lifestyle/?utm_campaign=Mash-Prod-RSS-Feedburner-All-Partial&utm_cid=Mash-Prod-RSS-Feedburner-All-Partial">Lifestyle</a>, and <a href="http://mashable.com/category/weird-products/?utm_campaign=Mash-Prod-RSS-Feedburner-All-Partial&utm_cid=Mash-Prod-RSS-Feedburner-All-Partial">Weird Products</a>
This is an awful lot of information. The part I only really need for readers is this:
<p>Say goodbye messy pedicures and hello to finally feeling the sweet freedom of open toed shoes in summer.</p>
<p>Japanese fashion company <a href="http://www.bellemaison.jp/cpg/fashion/fakenail/fakenail_index.html">Belle Maison</a> has a time saving solution for those of us out there who have little time and little hand coordination for painting our toenails — thin stockings with pre-painted toenails.</p>
How do I get only this part? Should I just go for python regex? I am not too sure since almost all descriptions are different, so writing an expression for that will be difficult. Is there another RSS item element that only provides the info I want? Thanks!
Upvotes: 0
Views: 175
Reputation: 3405
If you want to go the re
way, you can do the following
pat = re.compile(r"<div>(.*?)</div>")
s = pat.search(html).group(1)
result = [line.strip() for line in s.strip().splitlines()[:2]]
# result
['<p>Say goodbye messy pedicures and hello to finally feeling the sweet freedom of open toed shoes in summer.</p>',
'<p>Japanese fashion company <a href="http://www.bellemaison.jp/cpg/fashion/fakenail/fakenail_index.html">Belle Maison</a> has a time saving solution for those of us out there who have little time and little hand coordination for painting our toenails — thin stockings with pre-painted toenails.</p>']
But as you can see, it's dirty and likely to break. So one solution is to write a grammar and a tiny parser. But the robust and convenient way is to use a parser like Beautifulsoup or lxml.
Upvotes: 1
Reputation: 2301
As you guessed correctly, Regex will not be able to accomplish this task (obligatory link to this question). So your best bet is to feed your HTML to a parser like Beautifulsoup, and write your logic for the parsed DOM Object.
from bs4 import BeautifulSoup
soup = BeautifulSoup(my_input_html_string)
my_elements = soup.find_all('p')[0:2]
Obviously, this code assumes that you're always looking for the first two <p>
's in any given DOM you feed to it. You'll have to adjust your logic based on a consistency found by looking at different descriptions that your input provides.
Upvotes: 1