Reputation: 17631
Let's say I have the following extremely large string using Python3.x, several GB in size and +10 billion characters in length:
string1 = "XYZYXZZXYZZXYZYXYXZYXZYXZYZYZXY.....YY"
Given its length, this already takes +GB to load into RAM.
I would like to write a function that will replace every X
with A
, Y
with B
, and Z
with C
. My goal is to make this as quick as possible. Naturally, this should be efficient as well (e.g. there may be some RAM trade-offs I'm not sure about).
The most obvious solution for me is to use the string
module and string.replace()
:
import string
def replace_characters(input_string):
new_string = input_string.replace("X", "A").replace("Y", "B").replace("Z", "C")
return new_string
foo = replace_characters(string1)
print(foo)
which outputs
'ABCBACCABCCABCBABACBACBACBCBCAB...BB'
I worry this is not the most efficient approach, as I'm simultaneously calling three functions at once on such a large data structure.
What is the most efficient solution for a string this large?
Upvotes: 0
Views: 3373
Reputation: 23186
A more memory efficient method, that will not generate so many temporary strings along the way, would be to use str.translate
.
>>> string1 = "XYZYXZZXYZZXYZYXYXZYXZYXZYZYZXY"
>>> string1.translate({ord("X"): "A", ord("Y"): "B", ord("Z"): "C"})
'ABCBACCABCCABCBABACBACBACBCBCAB'
This will allocate just one (extra large in your case) string.
Upvotes: 7