Roger
Roger

Reputation: 85

about redirect in python crawling and extracting data from a webpage

[http://www.canadapost.ca/cpotools/apps/track/personal/findByTrackNumber?execution=e1s1]

for example Tracking Numbers :LM920347139CN,
i want to extract the track history data,but it using redirect .
so how to figure out, it will be better if any ways to get data not containing presentation logic

Upvotes: 0

Views: 808

Answers (1)

mhawke
mhawke

Reputation: 87114

EDIT

Apparently there are REST and SOAP APIs available for tracking. See http://www.canadapost.ca/cpo/mc/business/productsservices/developers/services/tracking/default.jsf


The easiest (non-API) way is probably to use the mechanize module which you can get from PyPI. You use it like a web browser. It will follow the redirect for you and manage any cookies as required by this particular web site. Example:

import mechanize

br = mechanize.Browser()
url = 'http://www.canadapost.ca/cpotools/apps/track/personal/findByTrackNumber'
response = br.open(url)
br.select_form('tapByTrackSearch:trackSearch')
br.form['tapByTrackSearch:trackSearch:trackNumbers'] = 'LM920347139CN'
response = br.submit()
html = response.read()

If you prefer to use requests, or if you need to support Python 3, requests will also follow redirects and manage cookies as required

import requests

s = requests.Session()
url = 'http://www.canadapost.ca/cpotools/apps/track/personal/findByTrackNumber'
response = s.get(url)

With requests, however, you will need to set up the required POST form fields (which I do not show here).

Once you have the HTML you can use a HTML parser such as BeautifulSoup to process and extract the required data.

from bs4 import BeautifulSoup

soup = BeautifulSoup(html)
tracking_table = soup.find(id='tapListResultForm:table_2')
.
.
.

from which you can extract the tracking data.

Upvotes: 1

Related Questions