Birgit P.
Birgit P.

Reputation: 396

How to efficiently replace german umlauts in C++?

Assume that I get a few hundred lines of text as a string (C++) from an API, and sprinkled into that data are german umlauts, such as ä or ö, which need to be replaced with ae and oe. I'm familiar with encoding (well, I've read http://www.joelonsoftware.com/articles/Unicode.html) and solving the problem was trivial (basically, searching through the string, removing the char and adding 2 others instead).

However, I do not know enough about C++ to do this fast. I've just stumbled upon StringBuilder (http://www.codeproject.com/Articles/647856/4350-Performance-Improvement-with-the-StringBuilde), which improved speed a lot, but I was curious if there are any better or smarter ways to do this?

Upvotes: 1

Views: 2990

Answers (2)

Philipp
Philipp

Reputation: 69663

When it is encoded in UTF-8, the german umlauts are all two-byte values in unicode, and so are their replacements like ae or oe. So when you use a char[] instead of a string, you wouldn't have to reallocate any memory and could just replace the bytes while iterating the char[].

Upvotes: 2

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726639

If you must improve efficiency on such small scale, consider doing the replacement in two phases:

  • The first phase calculates the number of characters in the result after the replacement. Go through the string, and add 1 to the count for each normal character; for characters such as ä or ö, add 2.
  • At this point, you have enough information to allocate the string for the result. Make a string of the length that you counted in the first phase.
  • The second phase performs the actual replacement: go through the string again, copying the regular characters, and replacing umlauted ones with their corresponding pairs.

Upvotes: 4

Related Questions