James
James

Reputation: 4307

Finding and replacing elements in a list

I have to search through a list and replace all occurrences of one element with another. So far my attempts in code are getting me nowhere, what is the best way to do this?

For example, suppose my list has the following integers

a = [1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1]

and I need to replace all occurrences of the number 1 with the value 10 so the output I need is

a = [10, 2, 3, 4, 5, 10, 2, 3, 4, 5, 10]

Thus my goal is to replace all instances of the number 1 with the number 10.

Upvotes: 428

Views: 1391275

Answers (11)

cottontail
cottontail

Reputation: 23131

In many cases, defining a replacer function and calling it in a loop is very readable. It's very useful if values need to be replaced using some rule. For example, to replace values depending on whether it's divisible by 15, 3 or 5, one can define a function that checks conditions and returns an appropriate value.

def fizzbuzz(num):
    if num % 15 == 0:
        return 'FizzBuzz'
    elif num % 3 == 0:
        return 'Fizz'
    elif num % 5 == 0:
        return 'Buzz'
    else:
        return num

a = [1, 2, 3, 4, 5, 10, 15, 1]
a[:] = (fizzbuzz(n) for n in a)
# or 
a[:] = map(fizzbuzz, a)
a # [1, 2, 'Fizz', 4, 1, 'Buzz', 'Fizz', 2, 'Fizz', 1, 1]

One thing to note is that (at least as of python 3.10) if a function needs to be called on every element in a list, then map() is faster than a comprehension.1

This difference is even more pronounced for built-in methods such as dict.get().2 So @roipoussiere's solution can be made twice as fast by simply mapping it like the following.

a = [1, 2, 3, 4, 1, 5, 3, 2, 6, 1, 1]
replacer = {1:10, 2:20, 3:'foo'}.get
a[:] = map(replacer, a, a)        # in-place replacement
a1 = [*map(replacer, a, a)]       # new list

1 map() is ~20% faster than calling a function in a list comprehension in the example below.

a = [1, 2, 3, 4, 1, 5, 3, 2, 6, 1, 1]*10000
%timeit [fizzbuzz(n) for n in a]
# 15.1 ms ± 443 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%timeit list(map(fizzbuzz, a))
# 12.7 ms ± 27.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

2 When replacing values using dict.get(), mapping it is ~2 times faster than calling it in a list comprehension.

a = [1, 2, 3, 4, 1, 5, 3, 2, 6, 1, 1]*10000
replacer = {1:10, 2:20, 3:'foo'}.get

%timeit [replacer(n,n) for n in a]
# 4.64 ms ± 195 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%timeit list(map(replacer, a, a))
# 2.21 ms ± 13.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Upvotes: 1

kxr
kxr

Reputation: 5529

On long lists and rare occurrences its about 3x faster using list.index() - compared to single step iteration methods presented in the other answers.

def list_replace(lst, old=1, new=10):
    """replace list elements (inplace)"""
    i = -1
    try:
        while True:
            i = lst.index(old, i + 1)
            lst[i] = new
    except ValueError:
        pass

Upvotes: 15

ghostdog74
ghostdog74

Reputation: 342363

You can use the built-in enumerate to get both index and value while iterating the list. Then, use the value to test for a condition and the index to replace that value in the original list:

>>> a = [1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1]
>>> for i, n in enumerate(a):
...   if n == 1:
...      a[i] = 10
...
>>> a
[10, 2, 3, 4, 5, 10, 2, 3, 4, 5, 10]

Upvotes: 319

Jerry Chen
Jerry Chen

Reputation: 379

I might be a dumb-dumb, but I would write a separate, simple function for this:

def convertElements( oldlist, convert_dict ):
  newlist = []
  for e in oldlist:
    if e in convert_dict:
      newlist.append(convert_dict[e])
    else:
      newlist.append(e)
  return newlist

And then call this as needed like so:

a = [1,2,3,4,5,1,2,3,4,5,1]
a_new = convertElements(a, {1: 10})
## OUTPUT: a_new=[10, 2, 3, 4, 5, 10, 2, 3, 4, 5, 10]

Upvotes: 0

outis
outis

Reputation: 77400

Try using a list comprehension and a conditional expression.

>>> a=[1,2,3,1,3,2,1,1]
>>> [4 if x==1 else x for x in a]
[4, 2, 3, 4, 3, 2, 4, 4]

Upvotes: 717

roipoussiere
roipoussiere

Reputation: 5956

If you have several values to replace, you can also use a dictionary:

a = [1, 2, 3, 4, 1, 5, 3, 2, 6, 1, 1]
replacements = {1:10, 2:20, 3:'foo'}
replacer = replacements.get  # For faster gets.

print([replacer(n, n) for n in a])

> [10, 20, 'foo', 4, 10, 5, 'foo', 20, 6, 10, 10]

Note that this approach works only if the elements to be replaced are hashable. This is because dict keys are required to be hashable.

Upvotes: 88

dawg
dawg

Reputation: 103834

The answers for this old but relevant question are wildly variable in speed.

The fastest of the solution posted by kxr.

However, this is even faster and otherwise not here:

def f1(arr, find, replace):
    # fast and readable
    base=0
    for cnt in range(arr.count(find)):
        offset=arr.index(find, base)
        arr[offset]=replace
        base=offset+1

Here is timing for the various solutions. The faster ones are 3X faster than accepted answer and 5X faster than the slowest answer here.

To be fair, all methods needed to do inlace replacement of the array sent to the function.

Please see timing code below:

def f1(arr, find, replace):
    # fast and readable
    base=0
    for cnt in range(arr.count(find)):
        offset=arr.index(find, base)
        arr[offset]=replace
        base=offset+1
        
def f2(arr,find,replace):
    # accepted answer
    for i,e in enumerate(arr):
        if e==find: 
            arr[i]=replace
        
def f3(arr,find,replace):
    # in place list comprehension
    arr[:]=[replace if e==find else e for e in arr]
    
def f4(arr,find,replace):
    # in place map and lambda -- SLOW
    arr[:]=list(map(lambda x: x if x != find else replace, arr))
    
def f5(arr,find,replace):
    # find index with comprehension
    for i in [i for i, e in enumerate(arr) if e==find]:
        arr[i]=replace
        
def f6(arr,find,replace):
    # FASTEST but a little les clear
    try:
        while True:
            arr[arr.index(find)]=replace
    except ValueError:
        pass    

def f7(lst, old, new):
    """replace list elements (inplace)"""
    i = -1
    try:
        while 1:
            i = lst.index(old, i + 1)
            lst[i] = new
    except ValueError:
        pass
    
    
import time     

def cmpthese(funcs, args=(), cnt=1000, rate=True, micro=True):
    """Generate a Perl style function benchmark"""                   
    def pprint_table(table):
        """Perl style table output"""
        def format_field(field, fmt='{:,.0f}'):
            if type(field) is str: return field
            if type(field) is tuple: return field[1].format(field[0])
            return fmt.format(field)     

        def get_max_col_w(table, index):
            return max([len(format_field(row[index])) for row in table])         

        col_paddings=[get_max_col_w(table, i) for i in range(len(table[0]))]
        for i,row in enumerate(table):
            # left col
            row_tab=[row[0].ljust(col_paddings[0])]
            # rest of the cols
            row_tab+=[format_field(row[j]).rjust(col_paddings[j]) for j in range(1,len(row))]
            print(' '.join(row_tab))                

    results={}
    for i in range(cnt):
        for f in funcs:
            start=time.perf_counter_ns()
            f(*args)
            stop=time.perf_counter_ns()
            results.setdefault(f.__name__, []).append(stop-start)
    results={k:float(sum(v))/len(v) for k,v in results.items()}     
    fastest=sorted(results,key=results.get, reverse=True)
    table=[['']]
    if rate: table[0].append('rate/sec')
    if micro: table[0].append('\u03bcsec/pass')
    table[0].extend(fastest)
    for e in fastest:
        tmp=[e]
        if rate:
            tmp.append('{:,}'.format(int(round(float(cnt)*1000000.0/results[e]))))

        if micro:
            tmp.append('{:,.1f}'.format(results[e]/float(cnt)))

        for x in fastest:
            if x==e: tmp.append('--')
            else: tmp.append('{:.1%}'.format((results[x]-results[e])/results[e]))
        table.append(tmp) 

    pprint_table(table)                    



if __name__=='__main__':
    import sys
    import time 
    print(sys.version)
    cases=(
        ('small, found', 9, 100),
        ('small, not found', 99, 100),
        ('large, found', 9, 1000),
        ('large, not found', 99, 1000)
    )
    for txt, tgt, mul in cases:
        print(f'\n{txt}:')
        arr=[1,2,3,4,5,6,7,8,9,0]*mul 
        args=(arr,tgt,'X')
        cmpthese([f1,f2,f3, f4, f5, f6, f7],args)   

And the results:

3.9.1 (default, Feb  3 2021, 07:38:02) 
[Clang 12.0.0 (clang-1200.0.32.29)]

small, found:
   rate/sec μsec/pass     f4     f3     f5     f2     f6     f7     f1
f4  133,982       7.5     -- -38.8% -49.0% -52.5% -78.5% -78.6% -82.9%
f3  219,090       4.6  63.5%     -- -16.6% -22.4% -64.8% -65.0% -72.0%
f5  262,801       3.8  96.1%  20.0%     --  -6.9% -57.8% -58.0% -66.4%
f2  282,259       3.5 110.7%  28.8%   7.4%     -- -54.6% -54.9% -63.9%
f6  622,122       1.6 364.3% 184.0% 136.7% 120.4%     --  -0.7% -20.5%
f7  626,367       1.6 367.5% 185.9% 138.3% 121.9%   0.7%     -- -19.9%
f1  782,307       1.3 483.9% 257.1% 197.7% 177.2%  25.7%  24.9%     --

small, not found:
   rate/sec μsec/pass     f4     f5     f2     f3     f6     f7     f1
f4   13,846      72.2     -- -40.3% -41.4% -47.8% -85.2% -85.4% -86.2%
f5   23,186      43.1  67.5%     --  -1.9% -12.5% -75.2% -75.5% -76.9%
f2   23,646      42.3  70.8%   2.0%     -- -10.8% -74.8% -75.0% -76.4%
f3   26,512      37.7  91.5%  14.3%  12.1%     -- -71.7% -72.0% -73.5%
f6   93,656      10.7 576.4% 303.9% 296.1% 253.3%     --  -1.0%  -6.5%
f7   94,594      10.6 583.2% 308.0% 300.0% 256.8%   1.0%     --  -5.6%
f1  100,206      10.0 623.7% 332.2% 323.8% 278.0%   7.0%   5.9%     --

large, found:
   rate/sec μsec/pass     f4     f2     f5     f3     f6     f7     f1
f4      145   6,889.4     -- -33.3% -34.8% -48.6% -85.3% -85.4% -85.8%
f2      218   4,593.5  50.0%     --  -2.2% -22.8% -78.0% -78.1% -78.6%
f5      223   4,492.4  53.4%   2.3%     -- -21.1% -77.5% -77.6% -78.2%
f3      282   3,544.0  94.4%  29.6%  26.8%     -- -71.5% -71.6% -72.3%
f6      991   1,009.5 582.4% 355.0% 345.0% 251.1%     --  -0.4%  -2.8%
f7      995   1,005.4 585.2% 356.9% 346.8% 252.5%   0.4%     --  -2.4%
f1    1,019     981.3 602.1% 368.1% 357.8% 261.2%   2.9%   2.5%     --

large, not found:
   rate/sec μsec/pass     f4     f5     f2     f3     f6     f7     f1
f4      147   6,812.0     -- -35.0% -36.4% -48.9% -85.7% -85.8% -86.1%
f5      226   4,424.8  54.0%     --  -2.0% -21.3% -78.0% -78.1% -78.6%
f2      231   4,334.9  57.1%   2.1%     -- -19.6% -77.6% -77.7% -78.2%
f3      287   3,484.0  95.5%  27.0%  24.4%     -- -72.1% -72.2% -72.8%
f6    1,028     972.3 600.6% 355.1% 345.8% 258.3%     --  -0.4%  -2.7%
f7    1,033     968.2 603.6% 357.0% 347.7% 259.8%   0.4%     --  -2.3%
f1    1,057     946.2 619.9% 367.6% 358.1% 268.2%   2.8%   2.3%     --

Upvotes: 4

Jay
Jay

Reputation: 2868

My usecase was replacing None with some default value.

I've timed approaches to this problem that were presented here, including the one by @kxr - using str.count.

Test code in ipython with Python 3.8.1:

def rep1(lst, replacer = 0):
    ''' List comprehension, new list '''

    return [item if item is not None else replacer for item in lst]


def rep2(lst, replacer = 0):
    ''' List comprehension, in-place '''    
    lst[:] =  [item if item is not None else replacer for item in lst]

    return lst


def rep3(lst, replacer = 0):
    ''' enumerate() with comparison - in-place '''
    for idx, item in enumerate(lst):
        if item is None:
            lst[idx] = replacer

    return lst


def rep4(lst, replacer = 0):
    ''' Using str.index + Exception, in-place '''

    idx = -1
    # none_amount = lst.count(None)
    while True:
        try:
            idx = lst.index(None, idx+1)
        except ValueError:
            break
        else:
            lst[idx] = replacer

    return lst


def rep5(lst, replacer = 0):
    ''' Using str.index + str.count, in-place '''

    idx = -1
    for _ in range(lst.count(None)):
        idx = lst.index(None, idx+1)
        lst[idx] = replacer

    return lst


def rep6(lst, replacer = 0):
    ''' Using map, return map iterator '''

    return map(lambda item: item if item is not None else replacer, lst)


def rep7(lst, replacer = 0):
    ''' Using map, return new list '''

    return list(map(lambda item: item if item is not None else replacer, lst))


lst = [5]*10**6
# lst = [None]*10**6

%timeit rep1(lst)    
%timeit rep2(lst)    
%timeit rep3(lst)    
%timeit rep4(lst)    
%timeit rep5(lst)    
%timeit rep6(lst)    
%timeit rep7(lst)    

I get:

26.3 ms ± 163 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
29.3 ms ± 206 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
33.8 ms ± 191 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
11.9 ms ± 37.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
11.9 ms ± 60.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
260 ns ± 1.84 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
56.5 ms ± 204 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Using the internal str.index is in fact faster than any manual comparison.

I didn't know if the exception in test 4 would be more laborious than using str.count, the difference seems negligible.

Note that map() (test 6) returns an iterator and not an actual list, thus test 7.

Upvotes: 10

Tiago Vieira
Tiago Vieira

Reputation: 170

I know this is a very old question and there's a myriad of ways to do it. The simpler one I found is using numpy package.

import numpy

arr = numpy.asarray([1, 6, 1, 9, 8])
arr[ arr == 8 ] = 0 # change all occurrences of 8 by 0
print(arr)

Upvotes: 9

damzam
damzam

Reputation: 1961

List comprehension works well, and looping through with enumerate can save you some memory (b/c the operation's essentially being done in place).

There's also functional programming. See usage of map:

>>> a = [1,2,3,2,3,4,3,5,6,6,5,4,5,4,3,4,3,2,1]
>>> map(lambda x: x if x != 4 else 'sss', a)
[1, 2, 3, 2, 3, 'sss', 3, 5, 6, 6, 5, 'sss', 5, 'sss', 3, 'sss', 3, 2, 1]

Upvotes: 43

John La Rooy
John La Rooy

Reputation: 304167

>>> a=[1,2,3,4,5,1,2,3,4,5,1]
>>> item_to_replace = 1
>>> replacement_value = 6
>>> indices_to_replace = [i for i,x in enumerate(a) if x==item_to_replace]
>>> indices_to_replace
[0, 5, 10]
>>> for i in indices_to_replace:
...     a[i] = replacement_value
... 
>>> a
[6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6]
>>> 

Upvotes: 12

Related Questions