Grabbing items using selector within python script

Question

I've written some code in python to get company details and names from a webpage. I used css selector in my script to collect those items. However, when I run it I get "company details" and "contact" only the first portion separated by "br" tag out of a full string. How can i get the full portion other than what I've got?

Script I'm trying with:

import requests ; from lxml import html

tree = html.fromstring(requests.get("https://www.austrade.gov.au/SupplierDetails.aspx?ORGID=ORG8000000314&folderid=1736").text)
for title in tree.cssselect("div.contact-details"):
    cDetails = title.cssselect("h3:contains('Contact Details')+p")[0].text
    cContact = title.cssselect("h4:contains('Contact')+p")[0].text
    print(cDetails, cContact)

Elements within which the search results are:


                Contact Details
Company Name: Distance Learning Australia Pty Ltd
Phone: +61 2 6262 2964
Fax: +61 2 6169 3168
Email: rto@dla.com.au
Web: http://dla.edu.au
Address
Suite 108A, 49 Phillip Avenue
Watson
ACT
2602
Contact
Name: Christine Jarrett
Phone: +61 2 6262 2964
Fax: +61 2 6169 3168
Email: chris.jarrett@dla.com.au

Results I'm getting:

Company Name: Distance Learning Australia Pty Ltd Name: Christine Jarrett

Results I'm after:

Company Name: Distance Learning Australia Pty Ltd
Phone: +61 2 6262 2964
Fax: +61 2 6169 3168
Email: rto@dla.com.au

Name: Christine Jarrett
Phone: +61 2 6262 2964
Fax: +61 2 6169 3168
Email: chris.jarrett@dla.com.au

Btw, my intention is to do the aforesaid thing using selectors only, not xpath. Thanks in advance.

Andersson · Accepted Answer

Simply replace text property with text_content() method as below to get required output:

cDetails = title.cssselect("h3:contains('Contact Details')+p")[0].text_content()
cContact = title.cssselect("h4:contains('Contact')+p")[0].text_content()

Grabbing items using selector within python script

Answers (2)

Related Questions