Reputation: 4307
I have to search through a list and replace all occurrences of one element with another. So far my attempts in code are getting me nowhere, what is the best way to do this?
For example, suppose my list has the following integers
a = [1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1]
and I need to replace all occurrences of the number 1 with the value 10 so the output I need is
a = [10, 2, 3, 4, 5, 10, 2, 3, 4, 5, 10]
Thus my goal is to replace all instances of the number 1 with the number 10.
Upvotes: 428
Views: 1391275
Reputation: 23131
In many cases, defining a replacer function and calling it in a loop is very readable. It's very useful if values need to be replaced using some rule. For example, to replace values depending on whether it's divisible by 15, 3 or 5, one can define a function that checks conditions and returns an appropriate value.
def fizzbuzz(num):
if num % 15 == 0:
return 'FizzBuzz'
elif num % 3 == 0:
return 'Fizz'
elif num % 5 == 0:
return 'Buzz'
else:
return num
a = [1, 2, 3, 4, 5, 10, 15, 1]
a[:] = (fizzbuzz(n) for n in a)
# or
a[:] = map(fizzbuzz, a)
a # [1, 2, 'Fizz', 4, 1, 'Buzz', 'Fizz', 2, 'Fizz', 1, 1]
One thing to note is that (at least as of python 3.10) if a function needs to be called on every element in a list, then map()
is faster than a comprehension.1
This difference is even more pronounced for built-in methods such as dict.get()
.2 So @roipoussiere's solution can be made twice as fast by simply mapping it like the following.
a = [1, 2, 3, 4, 1, 5, 3, 2, 6, 1, 1]
replacer = {1:10, 2:20, 3:'foo'}.get
a[:] = map(replacer, a, a) # in-place replacement
a1 = [*map(replacer, a, a)] # new list
1 map()
is ~20% faster than calling a function in a list comprehension in the example below.
a = [1, 2, 3, 4, 1, 5, 3, 2, 6, 1, 1]*10000
%timeit [fizzbuzz(n) for n in a]
# 15.1 ms ± 443 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%timeit list(map(fizzbuzz, a))
# 12.7 ms ± 27.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
2 When replacing values using dict.get()
, mapping it is ~2 times faster than calling it in a list comprehension.
a = [1, 2, 3, 4, 1, 5, 3, 2, 6, 1, 1]*10000
replacer = {1:10, 2:20, 3:'foo'}.get
%timeit [replacer(n,n) for n in a]
# 4.64 ms ± 195 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%timeit list(map(replacer, a, a))
# 2.21 ms ± 13.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Upvotes: 1
Reputation: 5529
On long lists and rare occurrences its about 3x faster using list.index()
- compared to single step iteration methods presented in the other answers.
def list_replace(lst, old=1, new=10):
"""replace list elements (inplace)"""
i = -1
try:
while True:
i = lst.index(old, i + 1)
lst[i] = new
except ValueError:
pass
Upvotes: 15
Reputation: 342363
You can use the built-in enumerate
to get both index and value while iterating the list. Then, use the value to test for a condition and the index to replace that value in the original list:
>>> a = [1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1]
>>> for i, n in enumerate(a):
... if n == 1:
... a[i] = 10
...
>>> a
[10, 2, 3, 4, 5, 10, 2, 3, 4, 5, 10]
Upvotes: 319
Reputation: 379
I might be a dumb-dumb, but I would write a separate, simple function for this:
def convertElements( oldlist, convert_dict ):
newlist = []
for e in oldlist:
if e in convert_dict:
newlist.append(convert_dict[e])
else:
newlist.append(e)
return newlist
And then call this as needed like so:
a = [1,2,3,4,5,1,2,3,4,5,1]
a_new = convertElements(a, {1: 10})
## OUTPUT: a_new=[10, 2, 3, 4, 5, 10, 2, 3, 4, 5, 10]
Upvotes: 0
Reputation: 77400
Try using a list comprehension and a conditional expression.
>>> a=[1,2,3,1,3,2,1,1]
>>> [4 if x==1 else x for x in a]
[4, 2, 3, 4, 3, 2, 4, 4]
Upvotes: 717
Reputation: 5956
If you have several values to replace, you can also use a dictionary:
a = [1, 2, 3, 4, 1, 5, 3, 2, 6, 1, 1]
replacements = {1:10, 2:20, 3:'foo'}
replacer = replacements.get # For faster gets.
print([replacer(n, n) for n in a])
> [10, 20, 'foo', 4, 10, 5, 'foo', 20, 6, 10, 10]
Note that this approach works only if the elements to be replaced are hashable. This is because dict keys are required to be hashable.
Upvotes: 88
Reputation: 103834
The answers for this old but relevant question are wildly variable in speed.
The fastest of the solution posted by kxr.
However, this is even faster and otherwise not here:
def f1(arr, find, replace):
# fast and readable
base=0
for cnt in range(arr.count(find)):
offset=arr.index(find, base)
arr[offset]=replace
base=offset+1
Here is timing for the various solutions. The faster ones are 3X faster than accepted answer and 5X faster than the slowest answer here.
To be fair, all methods needed to do inlace replacement of the array sent to the function.
Please see timing code below:
def f1(arr, find, replace):
# fast and readable
base=0
for cnt in range(arr.count(find)):
offset=arr.index(find, base)
arr[offset]=replace
base=offset+1
def f2(arr,find,replace):
# accepted answer
for i,e in enumerate(arr):
if e==find:
arr[i]=replace
def f3(arr,find,replace):
# in place list comprehension
arr[:]=[replace if e==find else e for e in arr]
def f4(arr,find,replace):
# in place map and lambda -- SLOW
arr[:]=list(map(lambda x: x if x != find else replace, arr))
def f5(arr,find,replace):
# find index with comprehension
for i in [i for i, e in enumerate(arr) if e==find]:
arr[i]=replace
def f6(arr,find,replace):
# FASTEST but a little les clear
try:
while True:
arr[arr.index(find)]=replace
except ValueError:
pass
def f7(lst, old, new):
"""replace list elements (inplace)"""
i = -1
try:
while 1:
i = lst.index(old, i + 1)
lst[i] = new
except ValueError:
pass
import time
def cmpthese(funcs, args=(), cnt=1000, rate=True, micro=True):
"""Generate a Perl style function benchmark"""
def pprint_table(table):
"""Perl style table output"""
def format_field(field, fmt='{:,.0f}'):
if type(field) is str: return field
if type(field) is tuple: return field[1].format(field[0])
return fmt.format(field)
def get_max_col_w(table, index):
return max([len(format_field(row[index])) for row in table])
col_paddings=[get_max_col_w(table, i) for i in range(len(table[0]))]
for i,row in enumerate(table):
# left col
row_tab=[row[0].ljust(col_paddings[0])]
# rest of the cols
row_tab+=[format_field(row[j]).rjust(col_paddings[j]) for j in range(1,len(row))]
print(' '.join(row_tab))
results={}
for i in range(cnt):
for f in funcs:
start=time.perf_counter_ns()
f(*args)
stop=time.perf_counter_ns()
results.setdefault(f.__name__, []).append(stop-start)
results={k:float(sum(v))/len(v) for k,v in results.items()}
fastest=sorted(results,key=results.get, reverse=True)
table=[['']]
if rate: table[0].append('rate/sec')
if micro: table[0].append('\u03bcsec/pass')
table[0].extend(fastest)
for e in fastest:
tmp=[e]
if rate:
tmp.append('{:,}'.format(int(round(float(cnt)*1000000.0/results[e]))))
if micro:
tmp.append('{:,.1f}'.format(results[e]/float(cnt)))
for x in fastest:
if x==e: tmp.append('--')
else: tmp.append('{:.1%}'.format((results[x]-results[e])/results[e]))
table.append(tmp)
pprint_table(table)
if __name__=='__main__':
import sys
import time
print(sys.version)
cases=(
('small, found', 9, 100),
('small, not found', 99, 100),
('large, found', 9, 1000),
('large, not found', 99, 1000)
)
for txt, tgt, mul in cases:
print(f'\n{txt}:')
arr=[1,2,3,4,5,6,7,8,9,0]*mul
args=(arr,tgt,'X')
cmpthese([f1,f2,f3, f4, f5, f6, f7],args)
And the results:
3.9.1 (default, Feb 3 2021, 07:38:02)
[Clang 12.0.0 (clang-1200.0.32.29)]
small, found:
rate/sec μsec/pass f4 f3 f5 f2 f6 f7 f1
f4 133,982 7.5 -- -38.8% -49.0% -52.5% -78.5% -78.6% -82.9%
f3 219,090 4.6 63.5% -- -16.6% -22.4% -64.8% -65.0% -72.0%
f5 262,801 3.8 96.1% 20.0% -- -6.9% -57.8% -58.0% -66.4%
f2 282,259 3.5 110.7% 28.8% 7.4% -- -54.6% -54.9% -63.9%
f6 622,122 1.6 364.3% 184.0% 136.7% 120.4% -- -0.7% -20.5%
f7 626,367 1.6 367.5% 185.9% 138.3% 121.9% 0.7% -- -19.9%
f1 782,307 1.3 483.9% 257.1% 197.7% 177.2% 25.7% 24.9% --
small, not found:
rate/sec μsec/pass f4 f5 f2 f3 f6 f7 f1
f4 13,846 72.2 -- -40.3% -41.4% -47.8% -85.2% -85.4% -86.2%
f5 23,186 43.1 67.5% -- -1.9% -12.5% -75.2% -75.5% -76.9%
f2 23,646 42.3 70.8% 2.0% -- -10.8% -74.8% -75.0% -76.4%
f3 26,512 37.7 91.5% 14.3% 12.1% -- -71.7% -72.0% -73.5%
f6 93,656 10.7 576.4% 303.9% 296.1% 253.3% -- -1.0% -6.5%
f7 94,594 10.6 583.2% 308.0% 300.0% 256.8% 1.0% -- -5.6%
f1 100,206 10.0 623.7% 332.2% 323.8% 278.0% 7.0% 5.9% --
large, found:
rate/sec μsec/pass f4 f2 f5 f3 f6 f7 f1
f4 145 6,889.4 -- -33.3% -34.8% -48.6% -85.3% -85.4% -85.8%
f2 218 4,593.5 50.0% -- -2.2% -22.8% -78.0% -78.1% -78.6%
f5 223 4,492.4 53.4% 2.3% -- -21.1% -77.5% -77.6% -78.2%
f3 282 3,544.0 94.4% 29.6% 26.8% -- -71.5% -71.6% -72.3%
f6 991 1,009.5 582.4% 355.0% 345.0% 251.1% -- -0.4% -2.8%
f7 995 1,005.4 585.2% 356.9% 346.8% 252.5% 0.4% -- -2.4%
f1 1,019 981.3 602.1% 368.1% 357.8% 261.2% 2.9% 2.5% --
large, not found:
rate/sec μsec/pass f4 f5 f2 f3 f6 f7 f1
f4 147 6,812.0 -- -35.0% -36.4% -48.9% -85.7% -85.8% -86.1%
f5 226 4,424.8 54.0% -- -2.0% -21.3% -78.0% -78.1% -78.6%
f2 231 4,334.9 57.1% 2.1% -- -19.6% -77.6% -77.7% -78.2%
f3 287 3,484.0 95.5% 27.0% 24.4% -- -72.1% -72.2% -72.8%
f6 1,028 972.3 600.6% 355.1% 345.8% 258.3% -- -0.4% -2.7%
f7 1,033 968.2 603.6% 357.0% 347.7% 259.8% 0.4% -- -2.3%
f1 1,057 946.2 619.9% 367.6% 358.1% 268.2% 2.8% 2.3% --
Upvotes: 4
Reputation: 2868
My usecase was replacing None
with some default value.
I've timed approaches to this problem that were presented here, including the one by @kxr - using str.count
.
Test code in ipython with Python 3.8.1:
def rep1(lst, replacer = 0):
''' List comprehension, new list '''
return [item if item is not None else replacer for item in lst]
def rep2(lst, replacer = 0):
''' List comprehension, in-place '''
lst[:] = [item if item is not None else replacer for item in lst]
return lst
def rep3(lst, replacer = 0):
''' enumerate() with comparison - in-place '''
for idx, item in enumerate(lst):
if item is None:
lst[idx] = replacer
return lst
def rep4(lst, replacer = 0):
''' Using str.index + Exception, in-place '''
idx = -1
# none_amount = lst.count(None)
while True:
try:
idx = lst.index(None, idx+1)
except ValueError:
break
else:
lst[idx] = replacer
return lst
def rep5(lst, replacer = 0):
''' Using str.index + str.count, in-place '''
idx = -1
for _ in range(lst.count(None)):
idx = lst.index(None, idx+1)
lst[idx] = replacer
return lst
def rep6(lst, replacer = 0):
''' Using map, return map iterator '''
return map(lambda item: item if item is not None else replacer, lst)
def rep7(lst, replacer = 0):
''' Using map, return new list '''
return list(map(lambda item: item if item is not None else replacer, lst))
lst = [5]*10**6
# lst = [None]*10**6
%timeit rep1(lst)
%timeit rep2(lst)
%timeit rep3(lst)
%timeit rep4(lst)
%timeit rep5(lst)
%timeit rep6(lst)
%timeit rep7(lst)
I get:
26.3 ms ± 163 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
29.3 ms ± 206 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
33.8 ms ± 191 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
11.9 ms ± 37.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
11.9 ms ± 60.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
260 ns ± 1.84 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
56.5 ms ± 204 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Using the internal str.index
is in fact faster than any manual comparison.
I didn't know if the exception in test 4 would be more laborious than using str.count
, the difference seems negligible.
Note that map()
(test 6) returns an iterator and not an actual list, thus test 7.
Upvotes: 10
Reputation: 170
I know this is a very old question and there's a myriad of ways to do it. The simpler one I found is using numpy
package.
import numpy
arr = numpy.asarray([1, 6, 1, 9, 8])
arr[ arr == 8 ] = 0 # change all occurrences of 8 by 0
print(arr)
Upvotes: 9
Reputation: 1961
List comprehension works well, and looping through with enumerate can save you some memory (b/c the operation's essentially being done in place).
There's also functional programming. See usage of map:
>>> a = [1,2,3,2,3,4,3,5,6,6,5,4,5,4,3,4,3,2,1]
>>> map(lambda x: x if x != 4 else 'sss', a)
[1, 2, 3, 2, 3, 'sss', 3, 5, 6, 6, 5, 'sss', 5, 'sss', 3, 'sss', 3, 2, 1]
Upvotes: 43
Reputation: 304167
>>> a=[1,2,3,4,5,1,2,3,4,5,1]
>>> item_to_replace = 1
>>> replacement_value = 6
>>> indices_to_replace = [i for i,x in enumerate(a) if x==item_to_replace]
>>> indices_to_replace
[0, 5, 10]
>>> for i in indices_to_replace:
... a[i] = replacement_value
...
>>> a
[6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6]
>>>
Upvotes: 12