Reputation: 78
I'm working with a string of bytes (which can be anywhere between 10kb and 3MB) and I need to filter out approximately 16 bytes (replacing them with other bytes)
At the moment I have a function a bit like this..
BYTE_REPLACE = {
52: 7, # first number is the byte I want to replace
53: 12, # while the second number is the byte I want to replace it WITH
}
def filter(st):
for b in BYTE_REPLACE:
st = st.replace(chr(b),chr(BYTE_REPLACE[b]))
return st
(Byte list paraphrased for the sake of this question)
Using map resulted in an execution time of ~.33s, while this results in a 10x faster time of ~.03s (Both performed on a HUGE string, larger than 1.5MB compressed).
While any performance gains would be considerably negligible, is there a better way of doing this?
(I am aware that it would be much more optimal to store the filtered string. This isn't an option, though. I'm fooling with a Minecraft Classic server's level format and have to filter out bytes that certain clients don't support)
Upvotes: 4
Views: 2755
Reputation: 369134
Use str.translate
:
def subs(st):
return st.translate(BYTE_REPLACE)
Example usage:
>>> subs('4567')
'\x07\x0c67'
import string
k, v = zip(*BYTE_REPLACE.iteritems())
k, v = ''.join(map(chr, k)), ''.join(map(chr, v))
tbl = string.maketrans(k, v)
def subs(st):
return st.translate(tbl)
Upvotes: 7
Reputation: 70602
Look up the translate()
method on strings. That allows you to do any number of 1-byte transformations in a single pass over the string. Use the string.maketrans()
function to build the translation table. If you usually have 16 pairs, this should run about 16 times faster than doing 1-byte replacements 16 times.
Upvotes: 4
Reputation:
In your current design, String.replace()
is being called on the string n
times, for each pair. While its most likely an efficient algorithm, on a 3MB string it might slow down.
If the string is already contained in memory by the time this function is called, I'd wager that the most efficient way would be:
BYTE_REPLACE = {
52: 7, # first number is the byte I want to replace
53: 12, # while the second number is the byte I want to replace it WITH
}
def filter(st):
st = list(st) # Convert string to list to edit in place :/
for i,s in enumerate(st): #iterate through list
if ord(s) in BYTE_REPLACE.keys():
s[i]=chr(BYTE_REPLACE[ord(b)])
return "".join(st) #return string
There is a large operation to create a new list at the start, and another to convert back to a string, but since python strings are immutable in your design a new string is made for each replacement.
This is all based on conjecture, and could be wrong. You'd want to test it with your actual data.
Upvotes: 0