jmtoung
jmtoung

Reputation: 1012

modifying part of a list in place using list comprehensions in python

I have a list that looks like

test = ['A','B','C','D D','E E','F F']

I would like test to become the following (that is, the spaces removed)

test = ['A', 'B', 'C', 'DD', 'EE', 'FF']

I used a list comprehension in Python to achieve this:

>>> [re.sub(' ','',i) for i in test]
['A', 'B', 'C', 'DD', 'EE', 'FF']

My question is - what if I explicitly DO NOT want re.sub(' ','',i) to run on the first three elements of my list? I only want the re.sub function to run on 'DD','EE', and 'FF'.

Is this way efficient? I understand a list comprehension takes up memory because Python makes a copy.

test2[3:] = [re.sub(' ','',i) for i in test[3:]]

Or should I just loop through the values of test that I want to modify like this:

for i in range(3,len(test)):
    print i
    test[i] = re.sub(' ','',test[i])

Upvotes: 1

Views: 252

Answers (3)

thefourtheye
thefourtheye

Reputation: 239683

The best of re.sub, str.replace and str.translate is the str.replace. So, use str.replace

Here is a little timing comparison.

import re

def test1():
    test = ['A','B','C','D D','E E','F F']
    test[3:] = [re.sub(' ','',i) for i in test[3:]]

def test2():
    test = ['A','B','C','D D','E E','F F']
    test[3:] = [i.replace(" ", "") for i in test[3:]]

def test3():
    test = ['A','B','C','D D','E E','F F']
    test[3:] = [item.translate(None, " ") for item in test[3:]]

from timeit import timeit
print timeit("test1()", "from __main__ import test1")
print timeit("test2()", "from __main__ import test2")
print timeit("test3()", "from __main__ import test3")

Output on my machine

3.96201109886
0.985305070877
1.11600804329

Note: As @roippi mentioned in the comments, str.translate will not work in this form in Python 3.x. So, ignore that in the race, if you are using Python 3.x

Upvotes: 2

roippi
roippi

Reputation: 25974

My question is - what if I explicitly DO NOT want re.sub(' ','',i) to run on the first three elements of my list?

Okay, answering that question first:

You can use enumerate and a conditional expression to specify the behavior you want for i < 3 and i >= 3:

[x if i<3 else re.sub(' ','',x) for i,x in enumerate(test)]
['A', 'B', 'C', 'DD', 'EE', 'FF']

Note that this simple sub operation can be handled more straightforwardly by str.replace.

(I will leave out discussion of whether this sort of optimization is worthwhile, other than saying the time saved by not doing re.sub on the first three elements is miniscule)

Upvotes: 1

NPE
NPE

Reputation: 500933

First of all, it sounds like you're optimizing prematurely.

Secondly, you can express your requirements with a single list comprehension:

In [5]: test = ['A','B','C','D D','E E','F F']

In [6]: [t if i < 3 else re.sub(' ', '', t) for (i, t) in enumerate(test)]
Out[6]: ['A', 'B', 'C', 'DD', 'EE', 'FF']

Finally, my advice would be to focus on correctness first, then on readability. Once you've achieved those, profile the code to see where the bottlenecks are, and only then optimize for performance.

Upvotes: 3

Related Questions