SIM
SIM

Reputation: 22440

Trouble parsing certain fields from a complicated webpage

Making my scraper It felt like i did nothing wrong but when i run this, it neither fetches any data nor throws any error. This three fields (Phone, webpage and email) I'm after. Seemed that Email and Webpage links are hidden so my xpath for this two fields are quite a bit messy. Any ideas will be highly appreciated. I've tried so far with:

import requests
from lxml import html
def startpoint():
    url="https://www.truelocal.com.au/business/strata-report-sydney/sydney"
    page=requests.get(url, headers={"user-agent" : "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"})
    tree=html.fromstring(page.text)
    titles=tree.xpath('//div[@class="column"]')
    for title in titles:
        Phone=title.xpath(".//span[contains(concat(' ', @class, ' '), ' ng-binding ')]/text()")[0]
        Web=title.xpath('.//span[@class="text-frame"]')[0]
        Email=title.xpath('.//a[@class="iconed-text"]/@href')[0]
        print(Phone,Web,Email)

startpoint()

Elements for the items within:

<div class="column" ng-class="vm.getTabletClass()">
                    <bdp-details-contact-website listing="vm.listing" contacts="vm.listing.contacts" class="ng-isolate-scope"><!-- ngIf: vm.getHavePrimaryWebsite()==true --><a class="iconed-text link-color-white-bck ng-scope" ng-if="vm.getHavePrimaryWebsite()==true" rel="nofollow" ng-click="vm.bdpEventTracking();">
  <span class="icon-holder">
    <i class="icon icon-computer-notebook-1"></i>
  </span>
  <span class="text-frame" ng-class="(vm.getHaveSecondaryWebsites()==true) ? 'with-aditional-item':''">
    <span ng-click="vm.openLink(vm.getReadableUrl(vm.getPrimaryWebsite()),'_blank')" role="button" tabindex="0">Visit website</span>
  </span>
</a><!-- end ngIf: vm.getHavePrimaryWebsite()==true --> <!-- iconed-text-->

<!-- ngRepeat: contact in vm.getSecondaryWebsites() --> <!-- iconed-text-->
</bdp-details-contact-website>
                    <a href="" class="iconed-text" ng-show="vm.isContactEmail" aria-hidden="false">
                      <span class="icon-holder">
                        <i class="icon icon-email"></i>
                      </span>
                      <span class="text-frame emailBusiness">
                        <span ng-click="vm.emailABusiness($event);" role="button" tabindex="0">Email this business</span>
                      </span>
                    </a> <!-- iconed-text-->
                    <div>
                        <bdp-details-contact-phone contacts="vm.listing.contacts" priority-number="vm.listing.preferences" class="ng-isolate-scope"><!-- ngRepeat: number in vm.getNumbers() --><!-- ngIf: vm.haveNumbers --><span class="iconed-text ng-scope" ng-if="vm.haveNumbers" ng-repeat="number in vm.getNumbers()">
  <span class="icon-holder">
    <!-- ngIf: $index==0 --><i class="icon-phone-call-2 ng-scope" ng-if="$index==0"></i><!-- end ngIf: $index==0 -->
  </span>
  <span class="text-frame">
    <!-- ngIf: vm.isMobile -->
    <!-- ngIf: !vm.isMobile --><span ng-if="!vm.isMobile" class="ng-binding ng-scope">0421 298 888</span><!-- end ngIf: !vm.isMobile -->
  </span>
</span><!-- end ngIf: vm.haveNumbers --><!-- end ngRepeat: number in vm.getNumbers() --><!-- ngIf: vm.haveNumbers --><span class="iconed-text ng-scope" ng-if="vm.haveNumbers" ng-repeat="number in vm.getNumbers()">
  <span class="icon-holder">
    <!-- ngIf: $index==0 -->
  </span>
  <span class="text-frame">
    <!-- ngIf: vm.isMobile -->
    <!-- ngIf: !vm.isMobile --><span ng-if="!vm.isMobile" class="ng-binding ng-scope">0478 151 999</span><!-- end ngIf: !vm.isMobile -->
  </span>
</span><!-- end ngIf: vm.haveNumbers --><!-- end ngRepeat: number in vm.getNumbers() --> <!-- iconed-text-->
</bdp-details-contact-phone>
                    </div>
                    <div>
                        <bdp-details-contact-fax contacts="vm.listing.contacts" class="ng-isolate-scope"><!-- ngIf: vm.getHaveFax()==true --> <!-- iconed-text-->
</bdp-details-contact-fax>
                    </div>
                    <div>
                        <bdp-details-abn-acn listing="vm.listing" class="ng-isolate-scope"><!-- ngIf: vm.haveAbn() -->
<!-- ngIf: vm.haveAcn() --></bdp-details-abn-acn>
                    </div>
                </div>

Upvotes: 0

Views: 149

Answers (1)

Tiny.D
Tiny.D

Reputation: 6556

Analyse:

If you look at the page source, the body is very simple without the <div class="column">.

The thing is the website will call some javascript and then re-write html element, the content you are looking for is writed by js. That's why when you use request, the page content will not show all source at the first place, you can not find the element <div class="column" ng-class="vm.getTabletClass()">, the return will be NONE.

Solution:

1, if you Inspect the website with chrome, you can find the div with class="column" like the element in your question, then maybe from here you could scrape only this part. However, your for loop will get all div with class="column", and raise list index out of range if no specific sub-element found, you probably just need the first div with class="column" to get Phone,Web,Email: titles[0].

2, Maybe you could try some webdriver component like selenium to simulate web browsing, with renders javascript.

BTW: for the Web and Email, the span come with ng-click, your code is not working for this part

Upvotes: 1

Related Questions