Reputation: 2793
Recently i am working on exercise and in which i had extracted whole webpage source data . I am very much interested in area tag . In area tag i am very much interested in onclick attribute . Now how can we extract onclick attribute from particular element . Now our extracted data is looking like these ,
<area class="borderimage" coords="21.32,14.4,933.96,180.56" href="javascript:void(0);" onclick="return show_pop('78545','51022929357','1')" onmouseover="borderit(this,'black','<b>इंदौर, गुरुवार, 10 मई , 2018 <b><br><bआप पढ़ रहे हैं देश का सबसे व...')" onmouseout="borderit(this,'white')" alt="<b>इंदौर, गुरुवार, 10 मई , 2018 <b><br><bआप पढ़ रहे हैं देश का सबसे व..." shape="rect">
I am very much interested in onclick attribute and my code is like these which i had already done but nothing has worked ,
paper_url = 'http://epaper.bhaskar.com/indore/129/10052018/mpcg/1/'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36'}
# Total number of pages available in these product
page = requests.get(paper_url,headers = headers)
page_response = page.text
parser = html.fromstring(page_response)
XPATH_Total_Pages = '//div[contains(@class,"fs12 fta w100 co_wh pdt5")]//text()'
raw_total_pages = parser.xpath(XPATH_Total_Pages)
lastpage=raw_total_pages[-1]
print(int(lastpage))
finallastpage=int(lastpage)
reviews_list = []
XPATH_PRODUCT_NAME = '//map[contains(@name,"Mapl")]'
#XPATH_PRODUCT_PRICE = '//span[@id="priceblock_ourprice"]/text()'
#raw_product_price = parser.xpath(XPATH_PRODUCT_PRICE)
#product_price = raw_product_price
raw_product_name = parser.xpath(XPATH_PRODUCT_NAME)
XPATH_REVIEW_SECTION_2 = '//area[@class="borderimage"]'
reviews = parser.xpath(XPATH_REVIEW_SECTION_2)
product_name =raw_product_name
#result = product_name.find(',')
#finalproductname = slice[0:product_name]
print(product_name)
print(reviews)
for review in reviews:
#soup = BeautifulSoup(str(review), "html.parser")
#parser2.feed(str(review))
#allattr = [tag.attrs for tag in review.findAll('onclick')]
#print(allattr)
XPATH_RATING = './/area[@data-hook="onclick"]'
raw_review_rating = review.xpath(XPATH_RATING)
#cleaning data
print(raw_review_rating)
Upvotes: 1
Views: 63
Reputation: 4483
If I got it right - you need to get all onclick
attributes of <area>
tags on a page.
Try something like this:
import requests
from bs4 import BeautifulSoup
TAG_NAME = 'area'
ATTR_NAME = 'onclick'
url = 'http://epaper.bhaskar.com/indore/129/10052018/mpcg/1/'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36'}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')
# there are 3 <area> tags on page; putting them into a list
area_onclick_attrs = [x[ATTR_NAME] for x in soup.findAll(TAG_NAME)]
print(area_onclick_attrs)
Output:
[
"return show_pophead('78545','51022929357','1')",
"return show_pop('78545','51022928950','4')",
"return show_pop('78545','51022929357','1')",
]
Upvotes: 1