Arrajj
Arrajj

Reputation: 187

Scraping attributes values in Python LXML is giving empty results

I am trying to scrape a site that you will find its link below in the code

The goal is to get the data from within the attributes since there is no text while inspecting the code

Here is the full XPath of an element:

/html/body/div[2]/div[3]/div/div[3]/section[1]/div/div[2]/div[1]

and the code:

import requests
from lxml import html
page = requests.get('https://www.meilleursagents.com/annonces/achat/nice-06000/appartement/')
tree = html.fromstring(page.content)

trying to scrape the attribute 'data-wa-data' value with:

tree.xpath('/html/body/div[2]/div[3]/div/div[3]/section[1]/div/div[2]/div[1]/@data-wa-data')

is yielding empty values

and the same issue is for another element that has a text:

tree.xpath('/html/body/div[2]/div[3]/div/div[3]/section[1]/div/div[2]/div[1]/div/a/div[1]/text()')

Upvotes: 0

Views: 100

Answers (1)

Shivam
Shivam

Reputation: 620

The problem is that this website requires the User-Agent to download the complete HTML which is absent in your case. So, to get the complete page pass user-agent as a header.

Note: This website is more aggressive when it comes to blocking. I mean, you cannot even make two consecutive requests with the same user-agent. Thus, my advice would be to rotate the proxies and user-agent. Moreover, also add download delay between each requests to avoid hitting server rapidly.

Code

import requests
from lxml import html

headers = {
    'user-agent':  'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0'
} 

page = requests.get('https://www.meilleursagents.com/annonces/achat/nice-06000/appartement/', headers=headers)

tree = html.fromstring(page.content)

print(tree.xpath('//div[@class="listing-item search-listing-result__item"]/@data-wa-data'))

output

['listing_id=1971029217|realtor_id=21407|source=listings_results', 'listing_id=1971046117|realtor_id=74051|source=listings_results', 'listing_id=1971051280|realtor_id=71648|source=listings_results', 'listing_id=1971053639|realtor_id=21407|source=listings_results', 'listing_id=1971053645|realtor_id=38087|source=listings_results', 'listing_id=1971053650|realtor_id=29634|source=listings_results', 'listing_id=1971053651|realtor_id=29634|source=listings_results', 'listing_id=1971053652|realtor_id=29634|source=listings_results', 'listing_id=1971053656|realtor_id=39588|source=listings_results', 'listing_id=1971053658|realtor_id=39588|source=listings_results', 'listing_id=1971053661|realtor_id=39588|source=listings_results', 'listing_id=1971053662|realtor_id=39588|source=listings_results']

Upvotes: 1

Related Questions