Xantium
Xantium

Reputation: 11605

How to read one line with urllib.request

I am trying to read one line of a web page with the urllib.request module.

I have tried readline(), readlines() and read() but I cannot make it read just one line.

How can I do this?

I am just trying to read the 581th line from python.org.

My script at the moment is:

import urllib.request

get_page = urllib.request.urlopen('https://www.python.org')
x = int('581')
get_ver = get_page.readline(x)

print("Currant Versions Are: ", get_ver)

And the result of this is:

Currant Versions Are:  b'<!doctype html>\n'

The result is always the same even if I change the number.

So how do I just read the 581th line?

Upvotes: 0

Views: 4771

Answers (2)

Xantium
Xantium

Reputation: 11605

This is one way to do it using readlines().

Here is the working script:

import urllib.request

get_page = urllib.request.urlopen('https://www.python.org')
get_ver = get_page.readlines()

print("Currant Versions Are: ", get_ver[580])

It wasn't working because the readlines() value must be a list. Also it is 580 not 581 because the first line counts as 0.

Upvotes: 0

ShmulikA
ShmulikA

Reputation: 3744

you are reading up to limit of 574 bytes and not the line 574.

that way you can get the n-th line number while trying to minimize the amount of data read from the server (check out http range request if you need better performance):

import urllib.request
from itertools import islice

get_page = urllib.request.urlopen('https://www.python.org')

def get_nth_line(resp, n):
    i = 1
    while i < n:
        resp.readline()
        i += 1
    return resp.readline()

print(get_nth_line(get_page, 574))

outputs:

b'<p>Latest: <a href="/downloads/release/python-362/">Python 3.6.2</a> - <a href="/downloads/release/python-2713/">Python 2.7.13</a></p>\n'

Suggestions

  1. use requests for http requests instead of urllib

requests.get('http://www.python.org').read()

  1. use regex or bs4 for parsing and extracting the version of python

Requests & Regex Example

import re, requests

resp = requests.get('http://www.python.org')
# regex might need adjustments
ver_regex = re.compile(r'<a href\="/downloads/release/python\-2\d+/">(.*?)</a>')
py2_ver = ver_regex.search(resp.text).group(1)
print(py2_ver)

outputs:

Python 2.7.13

Upvotes: 4

Related Questions