Zitrax
Zitrax

Reputation: 20344

String manipulation in Python

I am converting some code from another language to python. That code reads a rather large file into a string and then manipulates it by array indexing like:

str[i] = 'e'

This does not work directly in python due to the strings being immutable. What is the preferred way of doing this in python ?

I have seen the string.replace() function, but it returns a copy of the string which does not sound very optimal as the string in this case is an entire file.

Upvotes: 1

Views: 2248

Answers (4)

Chris Upchurch
Chris Upchurch

Reputation: 15537

Others have answered the string manipulation part of your question, but I think you ought to think about whether it would be better to parse the file and modify the data structure the text represents rather than manipulating the text directly.

Upvotes: 1

Nicholas Riley
Nicholas Riley

Reputation: 44361

Assuming you're not using a variable-length text encoding such as UTF-8, you can use array.array:

>>> import array
>>> a = array.array('c', 'foo')
>>> a[1] = 'e'
>>> a
array('c', 'feo')
>>> a.tostring()
'feo'

But since you're dealing with the contents of a file, mmap should be more efficient:

>>> f = open('foo', 'r+')
>>> import mmap
>>> m = mmap.mmap(f.fileno(), 0)
>>> m[:]
'foo\n'
>>> m[1] = 'e'
>>> m[:]
'feo\n'
>>> exit()
% cat foo
feo

Here's a quick benchmarking script (you'll need to replace dd with something else for non-Unix OSes):

import os, time, array, mmap

def modify(s):
    for i in xrange(len(s)):
        s[i] = 'q'

def measure(func):
    start = time.time()
    func(open('foo', 'r+'))
    print func.func_name, time.time() - start

def do_split(f):
    l = list(f.read())
    modify(l)
    return ''.join(l)

def do_array(f):
    a = array.array('c', f.read())
    modify(a)
    return a.tostring()

def do_mmap(f):
    m = mmap.mmap(f.fileno(), 0)
    modify(m)

os.system('dd if=/dev/random of=foo bs=1m count=5')

measure(do_mmap)
measure(do_array)
measure(do_split)

Output I got on my several-year-old laptop matches my intuition:

5+0 records in
5+0 records out
5242880 bytes transferred in 0.710966 secs (7374304 bytes/sec)
do_mmap 1.00865888596
do_array 1.09792494774
do_split 1.20163106918

So mmap is slightly faster but none of the suggested solutions is particularly different. If you're seeing a huge difference, try using cProfile to see what's taking the time.

Upvotes: 12

vartec
vartec

Reputation: 134721

Try:

sl = list(s)
sl[i] = 'e'
s = ''.join(sl)

Upvotes: 0

Can Berk Güder
Can Berk Güder

Reputation: 113370

l = list(str)
l[i] = 'e'
str = ''.join(l)

Upvotes: 9

Related Questions