Reputation: 396
Assume that I get a few hundred lines of text as a string (C++) from an API, and sprinkled into that data are german umlauts, such as ä or ö, which need to be replaced with ae and oe. I'm familiar with encoding (well, I've read http://www.joelonsoftware.com/articles/Unicode.html) and solving the problem was trivial (basically, searching through the string, removing the char and adding 2 others instead).
However, I do not know enough about C++ to do this fast. I've just stumbled upon StringBuilder (http://www.codeproject.com/Articles/647856/4350-Performance-Improvement-with-the-StringBuilde), which improved speed a lot, but I was curious if there are any better or smarter ways to do this?
Upvotes: 1
Views: 2990
Reputation: 69663
When it is encoded in UTF-8, the german umlauts are all two-byte values in unicode, and so are their replacements like ae
or oe
. So when you use a char[] instead of a string, you wouldn't have to reallocate any memory and could just replace the bytes while iterating the char[].
Upvotes: 2
Reputation: 726639
If you must improve efficiency on such small scale, consider doing the replacement in two phases:
1
to the count for each normal character; for characters such as ä or ö, add 2
.Upvotes: 4