Oli
Oli

Reputation: 239810

Read file object as string in python

I'm using urllib2 to read in a page. I need to do a quick regex on the source and pull out a few variables but urllib2 presents as a file object rather than a string.

I'm new to python so I'm struggling to see how I use a file object to do this. Is there a quick way to convert this into a string?

Upvotes: 33

Views: 51003

Answers (3)

t3rse
t3rse

Reputation: 10124

Michael Foord, aka Voidspace has an excellent tutorial on urllib2 which you can find here: urllib2 - The Missing Manual

What you are doing should be pretty straightforward, observe this sample code:

import urllib2
import re
response = urllib2.urlopen("http://www.voidspace.org.uk/python/articles/urllib2.shtml")
html = response.read()
pattern = '(V.+space)'
wordPattern = re.compile(pattern, re.IGNORECASE)
results = wordPattern.search(html)
print results.groups()

Upvotes: 5

gimel
gimel

Reputation: 86364

From the doc file.read() (my emphasis):

file.read([size])

Read at most size bytes from the file (less if the read hits EOF before obtaining size bytes). If the size argument is negative or omitted, read all data until EOF is reached. The bytes are returned as a string object. An empty string is returned when EOF is encountered immediately. (For certain files, like ttys, it makes sense to continue reading after an EOF is hit.) Note that this method may call the underlying C function fread more than once in an effort to acquire as close to size bytes as possible. Also note that when in non-blocking mode, less data than was requested may be returned, even if no size parameter was given.

Be aware that a regexp search on a large string object may not be efficient, and consider doing the search line-by-line, using file.next() (a file object is its own iterator).

Upvotes: 14

stesch
stesch

Reputation: 7215

You can use Python in interactive mode to search for solutions.

if f is your object, you can enter dir(f) to see all methods and attributes. There's one called read. Enter help(f.read) and it tells you that f.read() is the way to retrieve a string from an file object.

Upvotes: 80

Related Questions