Scraping data from website

Question

I'm facing problem in linking the links together. i need spider code who interlinks the links on the pages and grab me the required details until now my code is able to grab the required information but there are other pages too so i need other pages information too link the base_url contains the applications info then i want to collect all the links from that page and then want to switch next page and repeat the same thing then i need to collect the each application details like their names, version no etc from the links i have been collected
so right now im able to collect all the information only links are not inter linked how i can do that help me out..... here is my code:

#extracting links
def linkextract(soup): 
    print "
 extracting links of next pages"
    print "

 page 2 
"
        sAll = [div.find('a') for div in soup.findAll('div', attrs={'class':''})]
        for i in sAll:
            suburl = ""+i['href'] #checking pages
        print suburl
        pages = mech.open(suburl)
        content = pages.read()
        anosoup = BeautifulSoup(content)
        extract(anosoup)
    app_url = ""
    print app_url
    #print soup.prettify()
    page1 = mech.open(app_url)
    html1 = page1.read()
    soup1 = BeautifulSoup(html1)
    print "

 application page details 
"
    extractinside(soup1)

assistance required thank you.

Scraping data from website

Answers (1)

Related Questions