Simon
Simon

Reputation: 13

Parsing forum posts using lxml/python

When I use the code below, it splits one div into fifteen items in the array. The thing is that I want this one post as one item in the array. It probably happens because of <br> tags, but I am not sure how to solve it.

from lxml import html
import requests

page = requests.get('http://www.city-data.com/forum/economics/2056372-minimum-wage-vs-liveable-wage.html')

tree = html.fromstring(page.text)

details = tree.xpath('//div[contains(@id, "post_message_33583236")]/text()')

print len(details) #prints 15

Upvotes: 1

Views: 738

Answers (1)

alecxe
alecxe

Reputation: 473893

Find the element with xpath (not text) and use text_content() method:

details = tree.xpath('.//div[contains(@id, "post_message_33583236")]')[0]
print(details.text_content())

Prints:

With all the talk about raising the minimum wage, I think the real issue is that people are not getting a liveable wage anymore.  This applies to many skilled people too in which their job tries to pay them $10-13hr for $20-30hr type of work.

Not everyone deserves a raise at walmart or other low paying jobs.  I  think everyone should atleast prove themselves for 6 months to year then  start to gradually get a raise. You cant act a fool and get paid the same as people who work hard and try to move up in life. Even if walmart workers weren't making minimum wage and making  $11hr, you cant really do much making 22k a year other than live in a  cheap/borderline crime infested area

$11hr gets you about $1250 a month after taxes and health coverage at most jobs and ill list just the basic necessities in life
...

Upvotes: 1

Related Questions