Reputation: 799
i have to get many urls from a website and then i've to copy these in an excel file. I'm looking for an automatic way to do that. The website is structured having a main page with about 300 links and inside of each link there are 2 or 3 links that are interesting for me. Any suggestions ?
Upvotes: 0
Views: 1553
Reputation: 5728
You can use beautiful soup for parsing , [http://www.crummy.com/software/BeautifulSoup/]
More information about docs here http://www.crummy.com/software/BeautifulSoup/bs4/doc/
I won't suggest scrappy because you don't need that for work you described in your question.
For e.g. this code will use urllib2 library to open a google homepage and find all links in that output in the form of list
import urllib2
from bs4 import BeautifulSoup
data=urllib2.urlopen('http://www.google.com').read()
soup=BeautifulSoup(data)
print soup.find_all('a')
For handling excel files take a look at http://www.python-excel.org
Upvotes: 0
Reputation: 392
If the links are in the html... You can use beautiful soup. This has worked for me in the past.
import urllib2
from bs4 import BeautifulSoup
page = 'http://yourUrl.com'
opened = urllib2.urlopen(page)
soup = BeautifulSoup(opened)
for link in soup.find_all('a'):
print (link.get('href'))
Upvotes: 1
Reputation: 2809
have you tried selenium or urllib?.urllib is faster than selenium http://useful-snippets.blogspot.in/2012/02/simple-website-crawler-with-selenium.html
Upvotes: 0
Reputation: 26184
If you want to develop your solution in Python then I can recommend Scrapy framework.
As far as inserting the data into an Excel sheet is concerned, there are ways to do it directly, see for example here: Insert row into Excel spreadsheet using openpyxl in Python , but you can also write the data into a CSV file and then import it into Excel.
Upvotes: 1