Reputation: 25373
In Python, can I build a variable that acts like a string but is internally iterating through a sequence of strings?
For instance
def function_a():
for i in xrange(100000000):
yield str(i)
This, will iterate over a list of strings and it will do it efficiently - keeping only one string in memory at a time. But what I want is something like this:
''.join([s for s in function_a()])
But I bet this just does the naïve thing and iterates through the entire set and concatenates them all into one big string in memory. The other problem with this, is that I want a variable, I don't want to have to expose the user to the ugly work of actually doing the join. So maybe the user would do something like:
magic_str = get_long_but_memory_efficient_str()
And then use it to efficiently print to the screen (and free up memory as it goes):
print magic_str
Or my real use for it is to HTTP stream to a server:
request = urllib2.Request(url, magic_str)
Apparently something like this exists. Check out the code below for efficiently streaming a file to a server (from this question).
f = open('somelargefile.zip','rb')
mmapped_file_as_string = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
request = urllib2.Request(url, mmapped_file_as_string)
request.add_header("Content-Type", "application/zip")
response = urllib2.urlopen(request)
But my case is different because I'm constructing the string that I'm streaming to the server.
Upvotes: 2
Views: 103
Reputation: 10526
Updated answer for your practical need:
>>> class MagicString(str):
def __init__(self, gen):
self.gen = gen
def __str__(self):
try:
return self.gen.next()
except StopIteration:
return '' #boolean value = False
>>> def run_efficiently(some_function, magic_str):
substr = str(magic_str)
while substr:
some_function(substr)
substr = str(magic_str)
Explanation: You need a combination of:
Extending this example to print:
>>> import sys
>>> def print_without_breaks(some_string):
sys.stdout.write(some_string)
>>> s = MagicString(c for c in '12345')
>>> run_efficiently(print_without_breaks, s)
12345
You can use a similar one for your practical need where you can do something useful with the returned values of each request.
So maybe, you don't need a variable/object at all... Just some simple code that runs repeatedly until your generator spits out a StopIteration
exception.
Upvotes: 2
Reputation: 160
Not sure I understood exactly what you want, but it seems to me that you are concerned over the immutability of python strings.
join won't create a lot of temporary objects as you think. If you already have a list, ''.join is going to be pretty efficient and will create just one single string.
If you have no reason to create a list with the object you want to concat, just use cStringIO module. This is going to use the lest memory.
If you are still concerned or either you are die hard C programmer that don't understand how people cannot see that null terminated sequences of bytes are the way God wanted us to deal with strings, write that portion of your code in C, this is something that is pretty use to do in python, compared for example, with Java.
Upvotes: 0