Reputation: 11605
I am trying to read one line of a web page with the urllib.request
module.
I have tried readline()
, readlines()
and read()
but I cannot make it read just one line.
How can I do this?
I am just trying to read the 581th line from python.org.
My script at the moment is:
import urllib.request
get_page = urllib.request.urlopen('https://www.python.org')
x = int('581')
get_ver = get_page.readline(x)
print("Currant Versions Are: ", get_ver)
And the result of this is:
Currant Versions Are: b'<!doctype html>\n'
The result is always the same even if I change the number.
So how do I just read the 581th line?
Upvotes: 0
Views: 4771
Reputation: 11605
This is one way to do it using readlines()
.
Here is the working script:
import urllib.request
get_page = urllib.request.urlopen('https://www.python.org')
get_ver = get_page.readlines()
print("Currant Versions Are: ", get_ver[580])
It wasn't working because the readlines()
value must be a list. Also it is 580 not 581 because the first line counts as 0.
Upvotes: 0
Reputation: 3744
you are reading up to limit of 574 bytes and not the line 574.
that way you can get the n-th
line number while trying to minimize the amount of data read from the server (check out http range request if you need better performance):
import urllib.request
from itertools import islice
get_page = urllib.request.urlopen('https://www.python.org')
def get_nth_line(resp, n):
i = 1
while i < n:
resp.readline()
i += 1
return resp.readline()
print(get_nth_line(get_page, 574))
outputs:
b'<p>Latest: <a href="/downloads/release/python-362/">Python 3.6.2</a> - <a href="/downloads/release/python-2713/">Python 2.7.13</a></p>\n'
urllib
requests.get('http://www.python.org').read()
import re, requests
resp = requests.get('http://www.python.org')
# regex might need adjustments
ver_regex = re.compile(r'<a href\="/downloads/release/python\-2\d+/">(.*?)</a>')
py2_ver = ver_regex.search(resp.text).group(1)
print(py2_ver)
outputs:
Python 2.7.13
Upvotes: 4