Reputation: 5425
I'm new from python and I'm having some issue in doing a simple thing.
I've an html page and I want to analyze it and grab some links inside a spcific table.
In bash I'd use lynx --source and with grep/cut I'd have no problem..but in Python I dont know how to do it..
I thought to do something like that:
import urllib2
data = urllib2.urlopen("http://www.my_url.com")
Doing it I get the whole html page.
Then I thought to do:
for line in data.read():
if "my_links" in line:
print line
But seems it not working
Upvotes: 2
Views: 726
Reputation: 22510
Why don't you use simply enumerate()
:
site=urllib2.urlopen(r'http://www.rom.on.ca/en/join-us/jobs')
for i,j in enumerate(site):
if "http://www.ontario.ca" in j: #j is the line
print i+1 #i is the number start from 0 normally in the html code is 1 the first line so add +1
>>620
Upvotes: 0
Reputation: 791
You need Xpath for those purpose in general case. Examples: http://www.w3schools.com/xpath/xpath_examples.asp
Python has beautiful library called lxml
:
http://lxml.de/xpathxslt.html
Upvotes: 0
Reputation: 66739
On your code issue, this will read character by character. If you do not pass how much data to read.
for line in data.read():
you could do :
line = data.readline()
while(line):
print line
line = data.readline()
This portion is not exactly an answer but I suggest that you use BeautifulSoup.
import urllib2
from BeautifulSoup import BeautifulSoup
url = "http://www.my_url.com"
data = urllib2.urlopen(url).read()
soup = BeautifulSoup.BeautifulSoup(data)
all_links = soup.find('a')
# you can look for specific link
Upvotes: 1