Reputation: 706
I'm attempting to pull information from a log file posted online and read through the output. The only information i really need is posted at the end of the file. These files are pretty big and storing the entire socket output to a variable and reading through it is consuming alot of internal memory. is there a was to read the socket from bottom to top?
What I currently have:
socket = urllib.urlopen(urlString)
OUTPUT = socket.read()
socket.close()
OUTPUT = OUTPUT.split("\n")
for line in OUTPUT:
if "xxxx" in line:
print line
I am using Python 2.7. I pretty much want to read about 30 lines from the very end of the output of Socket.
Upvotes: 2
Views: 146
Reputation: 1579
What you want in this use case is the HTTP Range
request. Here is tutorial I located:
http://stuff-things.net/2015/05/13/web-scale-http-tail/
I should clarify: the advantage of getting the size with a Head request, then doing a Range request, is that you do not have to transfer all the content. You mentioned you have pretty big file resources, so this is going to be the best solution :)
edit: added this code below...
Here is a demo (simplified) of that blog article, but translated into Python. Please note this will not work with all HTTP servers! More comments inline:
"""
illustration of how to 'tail' a file using http. this will not work on all
webservers! if you need an http server to test with you can try the
rangehttpserver module:
$ pip install requests
$ pip install rangehttpserver
$ python -m RangeHTTPServer
"""
import requests
TAIL_SIZE = 1024
url = 'http://localhost:8000/lorem-ipsum.txt'
response = requests.head(url)
# not all servers return content-length in head, for some reason
assert 'content-length' in response.headers, 'Content length unknown- out of luck!'
# check the the resource length and construct a request header for that range
full_length = int(response.headers['content-length'])
assert full_length > TAIL_SIZE
headers = {
'range': 'bytes={}-{}'.format( full_length - TAIL_SIZE, full_length)
}
# Make a get request, with the range header
response = requests.get(url, headers=headers)
assert 'accept-ranges' in response.headers, 'Accept-ranges response header missing'
assert response.headers['accept-ranges'] == 'bytes'
assert len(response.text) == TAIL_SIZE
# Otherwise you get the entire file
response = requests.get(url)
assert len(response.text) == full_length
Upvotes: 2