Reputation: 77
I have the two following strings.
uncompressed "(A(2),I(10),A,A,A,A(3),R,R,R,R,A,A)"
compressed "(A(2),I(10),3A,A(3),4R,2A)"
Ignoring any entries in the format A(n) or I(n) you can see that when we find any consecutive repeated char it is replaced with a single entry at that position and a count.
I know there must be an elegant way to do this but I keep coming up with ugly looking nested loops.
The data in the strings comes from the ISO8211 file format and identify the format to apply to data in the sub fields.
I am sure this could be done with a single line of linq but I am out of ideas (tonight.)
Upvotes: 2
Views: 691
Reputation: 225238
Here's a method to do that, using LINQ's GroupBy
:
static string RLE(string s) {
s = s.Substring(1, s.Length - 2);
char? l = null;
int i = 0;
return "(" + string.Join(",", s.Split(',').GroupBy(c => {
if(c.Length != 1) {
i++;
return i++;
}
if(c[0] == l) {
return i;
}
l = c[0];
return ++i;
}).Select(x => (x.Count() > 1 ? x.Count().ToString() : string.Empty) + x.First())) + ")";
}
Upvotes: 0
Reputation: 226674
The technique is called Run Length Encoding.
Here's an example using Python:
from itertools import groupby
uncompressed = "(A(2),I(10),A,A,A,A(3),R,R,R,R,A,A)"
counted = [(k, len(list(g))) for k, g in groupby(uncompressed.split(','))]
compressed = ','.join(k if cnt==1 else str(cnt)+k for k, cnt in counted)
Upvotes: 2
Reputation: 700780
Well, not exactly a single line. This will do it:
string str = "(A(2),I(10),A,A,A,A(3),R,R,R,R,A,A)";
string prev = null;
int cnt = 0;
string result =
"(" + String.Join(",",
(str.TrimStart('(').TrimEnd(')') + ",").Split(',').Select(x => {
if (x == prev) {
cnt++;
return null;
} else {
string temp = cnt > 1 ? cnt.ToString() + prev : prev;
prev = x;
cnt = 1;
return temp;
}
}).Where(x => x != null)
) + ")";
Upvotes: 1