extracting paragraph in python using lxml

Question

I would like to extract paragraphs in html by python. I used lxml module but it doesn't do exactly what I am looking for.

print html.parse(url).xpath('//p')[1].text_content()

Here is the First Paragraph.
Here is the second Paragraph.
Paragraph Three."

I should add that, in different pages I have different number of paragraph, so would like to make a list and put paragraph into it after that.

jfs · Accepted Answer

print html.parse(url).xpath('//p/text()')

['Here is the First Paragraph.', 'Here is the second Paragraph.', 
 'Paragraph Three."']

Answers (1)