Reputation: 2891
I have a string. That string is a html code and it serves as a teaser for the blog posts I am creating. The whole html code (teaser) is stored in a field in the database.
My goal: I'd like to make that when a user (facebook like social button) likes certain blog post, right data is displayed on his news feeds. In order to do that I need to extract from the teaser in the first occurrence of an image an image path inside src="i-m-a-g-e--p-a-t-h"
. I succeeded when a user puts only one image in teaser, but if he accidentally puts two images or more the whole thing craches.
Furthermore, for description field I need to extract text inside the first occurrence inside <p>
tag. The problem is also that a user can put an image inside the first tag.
I would very much appreciate if an expert could help me resolve this what's been bugging me for days.
Text string with a regular expression for extracting src can be found here: http://rubular.com/r/gajzivoBSf
Thanks!
Upvotes: 0
Views: 235
Reputation: 303224
Don't try to parse HTML by yourself. Let the professionals do it.
require 'nokogiri'
frag = Nokogiri::HTML.fragment( your_html_string )
first_img_src = frag.at_css('img')['src']
first_p_text = frag.at_css('p').text
Upvotes: 2