Open a page programmatically in python

Can you extract the VIN number from this webpage?

I tried urllib2.build_opener, requests, and mechanize. I provided user-agent as well, but none of them could see the VIN.

opener = urllib2.build_opener()
opener.addheaders = [('User-agent',('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_7) ' 'AppleWebKit/535.1 (KHTML, like Gecko) ' 'Chrome/13.0.782.13 Safari/535.1'))]
page = opener.open(link)
soup = BeautifulSoup(page)

table = soup.find('dd', attrs = {'class': 'tip_vehicleStats'})
vin = table.contents[0]
print vin

Upvotes: 3

Answers (4)

krutitsky

Reputation: 1

You do not have to use Selenium. Just make an additional get request:

import requests

stock_number = '123456789'        # located at VEHICLE INFORMATION  
url = 'https://www.clearvin.com/ads/iaai/check?stockNumber={}&vin='.format(stock_number)
vin = requests.get(url).json()['car']['vin']

Upvotes: -1

Charles JOUBERT

Reputation: 159

You could use selenium, which calls a browser. This works for me :

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
import time

# See: http://stackoverflow.com/questions/20242794/open-a-page-programatically-in-python
browser = webdriver.Firefox() # Get local session of firefox
browser.get("https://www.iaai.com/Vehicles/VehicleDetails.aspx?auctionID=14712591&itemID=15775059&RowNumber=0") # Load page


time.sleep(0.5) # Let the page load


# Search for a tag "span" with an attribute "id" which contains "ctl00_ContentPlaceHolder1_VINc_VINLabel"
e=browser.find_element_by_xpath("//span[contains(@id,'ctl00_ContentPlaceHolder1_VINc_VINLabel')]")
e.text
# Works for me : u'4JGBF7BE9BA648275'

browser.close()

Upvotes: 2

shshank

Reputation: 2641

You can use browser automation tools for the purpose.

For example this simple selenium script can do your work.

from selenium import webdriver
from bs4 import BeautifulSoup

link = "https://www.iaai.com/Vehicles/VehicleDetails.aspx?auctionID=14712591&itemID=15775059&RowNumber=0"
browser = webdriver.Firefox()
browser.get(link)
page = browser.page_source

soup = BeautifulSoup(page)

table = soup.find('dd', attrs = {'class': 'tip_vehicleStats'})
vin = table.contents.span.contents[0]
print vin

BTW, table.contents[0] prints the entire span, including the span tags.

table.contents.span.contents[0] prints only the VIN no.

Upvotes: 5

Lennart Regebro

Reputation: 172239

That page has much of the information loaded and displayed with Javascript (probably through Ajax calls), most likely as a direct protection against scraping. To scrape this you therefore either need to use a browser that runs Javascript, and control it remotely, or write the scraper itself in javascript, or you need to deconstruct the site and figure out exactly what it loads with Javascript and how, and see if you can duplicate these calls.

Upvotes: 7

Open a page programmatically in python

Answers (4)

Related Questions