Reputation: 26611
Is it possible to find and replace any repetitive characters in a string using C#? I'm trying to reduce the size of a base64 string, which is converted from a jpeg image. I've noticed that the base64 strings contain many repeated characters such as:
6qdQAUUxJA7uuCGQ8g/wA6fQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFFFFABRRRQAUUUUAFYXiFL5b7TrmwtzM8Xmr7KWUAE+
If there was a way to remove the repetitive characters with something like this it would overall be much smaller:
[QAUUUUAFFFFABRRR, 18]
This is in the format of [REPEATED-CHARACTERS, NUMBER-OF-TIMES].
Would this be possible to do? Thanks for the help. :)
Upvotes: 4
Views: 793
Reputation: 6850
You're essentially trying to come up with your own lossless compression algorithm - algorithms like zip work by doing exactly what you're asking for, except that they work on bytes rather than characters in a string.
Popular compression algorithms are virtually guaranteed to be more efficient than something you can design and implement in a reasonable amount of time. For one, they will probably see patterns that aren't evident in the base64 string due to byte alignment issues.
So why not just use one of them to compress the binary data before base64-encoding it, instead of the other way around?
Upvotes: 1
Reputation:
You can find longest string with maximum repeats.
int mx = -1;
string str = null;
for (int i = 0; i < str.Length; i++) for (int j = i + 1; j < str.Length; j++)
{
string sub = str.Substring(i, j - i);
int tmp = countAll(str, sub); // write countAll() yourself
if (tmp > mx) { mx = tmp; str = sub; }
}
Or, better, use a Dictionary
.
Dictionary<char, int> rep = new Dictionary<char, int>();
for (int i = 0; i < str.Length; i++)
if (rep.ContainsKey(str[i])) rep[str[i]]++;
else rep.Add(str[i], 1);
You will have then each character assoicaited with it the number of occurrences:
string total = "";
foreach (var item in rep) total += item.Key;
ADD:
If you really want to find the longest repeated substring, then your should use Dynamic Programming to solve this problem, instead.
Upvotes: 1
Reputation: 167
You would essentially have to create a search and replace function. It really depends on whether or not the repetitive strings are of a constant length. In your example, the repetitive string is 16 characters long, so you could write a routing that grabs the first 16 characters, compares them to the next 16 characters, and so on until it finds a string that is different. It would then replace the string with your syntax to represent them.
If the length of the repetitive string is variable, then it's a little more complex. You would essentially have to start with a short string, and keep growing it, and comparing it to the next set of characters of the same length, if they repeat, check the next ones and so on. This could be hit and miss though.
Do a search on compression algorithms, as many of them work on similar principals.
Upvotes: 1