Reputation: 1037
I have an extension to a question already asked here
However I want to return the list of the longest set of reaccuring characters in the original string, not a list of char & their relative count, ordered by higheest.
I was fairly well versed in link, but had never come accross an instance of querying char types in a string and thought someone could give me a hint to help me understand specific use-cases of LINQ...
Thanks
Upvotes: 0
Views: 1842
Reputation: 20610
There is no need to create lots of intermediate objects. You just need to keep track of the character in the longest sequence and the length of that sequence:
char longest = '\0';
int longestLength = 0;
char last = '\0';
int lastLength = 0;
foreach (char c in input)
{
if (c == last)
{
lastLength++;
if (lastLength > longestLength)
{
longestLength = lastLength;
longest = c;
}
}
else
{
lastLength = 1;
}
last = c;
}
var result = new string(longest, longestLength);
Upvotes: 2
Reputation: 20772
I'm assuming that you want the longest substring. For example, for aab💈💈💈ccc💈💈
you want 💈💈💈
I also assume the problem domain is strings of Unicode characters. Unfortunately, .NET's System.String
is a sequence of codeunits. To count or index Unicode characters, you have to deal with them as codepoints. The easiest way to do that is to change the encoding to UTF-32 since there is then one int
per codepoint, and a codepoint is a numeric identifier for a Unicode character [generally speaking].
After that, to find the longest subsequence of identical characters, you have to run through the whole sequence. Run-length encoding is a generalized method that I'm using as an intermediate step. After finding the codepoint and length for the longest subsequence, I recreate a string of them.
const string test = "aab💈💈💈ccc💈💈"; // contains barber pole characters
Console.WriteLine(test);
var longest = test.ToCodepoints().RunLengthEncode().OrderByDescending(itemCount => itemCount.Item2).First();
var subsequence = String.Concat(Enumerable.Repeat(Char.ConvertFromUtf32(longest.Item1), longest.Item2));
Console.WriteLine(subsequence);
Converting a string to codepoints is equivalent to converting to UTF-32. It can be done with a System.Text.Encoding
method but then you end up with an array of bytes that then must be converted to codepoints. Here is an IEnumerable that yields a sequence of int
.
public static IEnumerable<int> ToCodepoints(this String s)
{
var codeunits = s.ToCharArray();
var i = 0;
while (i < codeunits.Length)
{
int codepoint;
if (Char.IsSurrogate(codeunits[i]))
{
codepoint = Char.ConvertToUtf32(codeunits[i], codeunits[i + 1]);
i += 2;
}
else
{
codepoint = codeunits[i];
i += 1;
}
yield return codepoint;
}
}
Run-length encoding produces a Tuple of the codepoint (Item1
) and the length of the run (Item2
) for each subsequence of identical codepoints:
public static IEnumerable<Tuple<T, int>> RunLengthEncode<T>(this IEnumerable<T> sequence)
{
T item = default(T); // value never used
int length = 0;
foreach (var nextItem in sequence)
{
if (length == 0) // first item
{
item = nextItem;
length = 1;
}
else if (item.Equals(nextItem)) // continuing run
{
length++;
}
else // run boundary
{
var run = Tuple.Create(item, length);
item = nextItem;
length = 1;
yield return run;
}
}
if (length > 0) // last run
{
yield return Tuple.Create(item, length);
}
Upvotes: 5
Reputation: 50215
Using the linked example:
var largest = input.GroupBy(x => x).OrderByDescending(x => x.Count()).First();
var asString = new string(largest.Key, largest.Count());
Upvotes: 4