jimstandard
jimstandard

Reputation: 1147

opening files from a website

I asked a similar question yesterday but I included some code that basically took my question on a different tangent than I had intended. So I shall try again.

I am rewriting a python script that crawls a website to find a few hundred text files, I have no interest in any content of the text file beyond the second line of the file. Previously I would download all of the files then loop through them all to extract the second line. I would now like to open each file as my script discovers it, grab the second line, and close it without downloading to my harddrive then opening it.

So basically is there a way I can open a file that is at www.example.com/123456.txt and take the second line from that file copy it to an array or something without downloading it and then opening it.

Upvotes: 0

Views: 133

Answers (3)

etuardu
etuardu

Reputation: 5516

You can't retrieve the fist N lines (or perform a line seek) but if the web server supports the Range header you could retrieve the first N bytes of the file (byte seek).

If you know the maximum length of a line, you could do this:

>>> import urllib2
>>> maxlinelength = 127 # nb: in terms of bytes
>>> myHeaders = {'Range':'bytes=0-'+str(maxlinelength)} # from byte 0 to maxlinelength
>>> req = urllib2.Request('http://www.constitution.org/gr/pericles_funeral_oration.txt', headers=myHeaders)
>>> partial = urllib2.urlopen(req)
>>> partial.readline() # first line discarded
>>> yourvar = partial.readline()
>>> yourvar # this is the second line:
'from Thucydides (c.460/455-399 BCE), \r\r\n'

Upvotes: 1

Facundo Casco
Facundo Casco

Reputation: 10585

You could try something like urllib2.urlopen('url').read().splitlines()[1] but I guess that would download the entire file to memory

Upvotes: 1

Amber
Amber

Reputation: 526573

Well, you could use urllib2.urlopen() to just get the file contents into memory, extract the second line, and then immediately discard the file from memory, if you wanted, without ever hitting your disk.

You are going to have to download the contents over the internet, though.

Upvotes: 2

Related Questions