Reputation: 299
As part of my plan to improve my programming skills, I have decided to attempt to sort a large array of strings lexicographically (which later on I plan to do so using threading). I have researched different sorting algorithms and I have tried to implement a Merge Sort my self from what I understood. For now I plan to sort a few simple strings.
I am inputting the following strings to be sorted in the below method:
string[] stringsArray = new string[] { "cixymn", "adfxij", "adxhxy", "abcdef", "iejfyq", "uqbzxo", "aaaaaa" };
string[] stringSorted = MergeSort(stringsArray);
// For display purposes
foreach (string s in stringSorted)
{
Console.WriteLine("Item at index " + Array.IndexOf(stringSorted, s) + " is " + s);
}
The result I am getting is the following:
Item at index 0 is aaaaaa
Item at index 1 is abcdef
Item at index 2 is adfxij
Item at index 3 is uqbzxo
Item at index 4 is
Item at index 4 is
Item at index 4 is
Since to implement the merge sort, you must first divide the array into two, I can easily understand that in this case, the left part is being sorted successfully whereas the right part is being ignored. I am under the impression that this is happening because I am comparing the characters of each string from the left side of the array in every recursion (thus possibly ignoring the right). So I think I actually understood where the problem MIGHT be. However, I do not quite know how to go about this. Any help would be very greatly appreciated.
Below is the code for the MergeSort method.
private static string[] MergeSort(string[] stringsArray)
{
if (stringsArray.Length == 1)
{
return stringsArray;
}
int middle = stringsArray.Length / 2;
string[] left = new string[middle];
string[] right = new string[stringsArray.Length - middle];
for (int i = 0; i < middle; i++)
{
left[i] = stringsArray[i];
}
for (int i = 0; i < stringsArray.Length - middle; i++)
{
right[i] = stringsArray[i + middle];
}
left = MergeSort(left);
right = MergeSort(right);
int leftPointer = 0;
int rightPointer = 0;
string[] sorted = new string[stringsArray.Length];
for (int k = 0; k < stringsArray.Length; k++)
{
if (k == left.Length)
{
break;
}
for (int i = 0; i < left[leftPointer].Count(); i++)
{
var leftChar = left[leftPointer][i];
if (i == right[rightPointer].Count())
{
continue;
}
var rightChar = right[rightPointer][i];
if ((rightPointer == right.Length || leftPointer < left.Length) && leftChar < rightChar)
{
sorted[k] = left[leftPointer];
sorted[k + 1] = right[rightPointer];
leftPointer++;
break;
}
if ((leftPointer == left.Length || rightPointer < right.Length) && (rightChar < leftChar))
{
sorted[k] = right[rightPointer];
sorted[k + 1] = left[leftPointer];
rightPointer++;
break;
}
}
}
QUESTION #2 : How would you recommend optimizing the code in order to be able to use threading?
Upvotes: 2
Views: 2842
Reputation: 5576
Here is a working solution. MergeSort is the most basic version, ThreadedMergeSort uses tasks and optimizes trivial cases. The simple version is about 30% slower than the .Sort() method (which is quicksort I think) on my machine, while the threaded version is about twice as fast.
static List<T> MergeSort<T>(List<T> input) where T: IComparable
{
var length = input.Count;
if (length < 2)
return input;
var left = MergeSort(input.GetRange(0, length / 2));
var right = MergeSort(input.GetRange(length / 2, length - length / 2));
var result = new List<T>();
for (int leftIndex = 0, leftLength = left.Count, rightLength = right.Count, rightIndex = 0; leftIndex + rightIndex < length;)
{
if (rightIndex >= rightLength || leftIndex < leftLength && left[leftIndex].CompareTo(right[rightIndex]) <= 0)
result.Add(left[leftIndex++]);
else
result.Add(right[rightIndex++]);
}
return result;
}
static List<T> ThreadedMergeSort<T>(List<T> input) where T : IComparable
{
var length = input.Count;
if (length < 2)
return input;
// this next part can be commented out if you want a "pure" mergesort, but it
// doesn't make sense to merge sort very short sublists.
if (length < 10)
{
for (int i = 0; i < length - 1; i++)
for (int j = i + 1; j < length; j++)
if (input[i].CompareTo(input[j]) > 0)
{
var tmp = input[i];
input[i] = input[j];
input[j] = tmp;
}
return input;
}
List<T> left, right;
if (length > 10000)
{
var taskLeft = Task<List<T>>.Factory.StartNew(() => { return ThreadedMergeSort(input.GetRange(0, length / 2)); });
var taskRight = Task<List<T>>.Factory.StartNew(() => { return ThreadedMergeSort(input.GetRange(length / 2, length - length / 2)); });
taskLeft.Wait();
taskRight.Wait();
left = taskLeft.Result;
right = taskRight.Result;
}
else
{
left = ThreadedMergeSort(input.GetRange(0, length / 2));
right = ThreadedMergeSort(input.GetRange(length / 2, length - length / 2));
}
var result = new List<T>();
for (int leftIndex = 0, leftLength = left.Count, rightLength = right.Count, rightIndex = 0; leftIndex + rightIndex < length; )
{
if (rightIndex >= rightLength || leftIndex < leftLength && left[leftIndex].CompareTo(right[rightIndex]) <= 0)
result.Add(left[leftIndex++]);
else
result.Add(right[rightIndex++]);
}
return result;
}
Upvotes: 1
Reputation: 35706
Heres an answer I cribbed, genericised and brought up to date from here
public static IList<T> MergeSort<T>(
this IList<T> unsorted,
IComparer<T> comparer = null)
{
if (unsorted == null || unsorted.Count < 2)
{
return unsorted;
}
if (comparer == null)
{
comparer = Comparer<T>.Default;
}
IList<T> sorted = new List<T>();
int middle = (int)unsorted.Count/2;
Ilist<T> left = unsorted.GetRange(0, middle);
IList<T> right = unsorted.GetRange(middle, unsorted.Count - middle);
var sortLeft = Task<IList<T>>.Factory.StartNew(
() => left.MergeSort(comparer));
var sortRight = Task<IList<T>>.Factory.StartNew(
() => right.MergeSort(comparer));
left = sortLeft.Result;
right = sortRight.Result;
int leftPtr = 0;
int rightPtr = 0;
for (int i = 0; i < left.Count + right.Count; i++)
{
if (leftPtr == left.Count)
{
sorted.Add(right[rightPtr]);
rightPtr++;
}
else if (rightPtr == right.Count)
{
sorted.Add(left[leftPtr]);
leftPtr++;
}
else if (comparer.Compare(left[leftPtr], right[rightPtr]) < 0)
{
sorted.Add(left[leftPtr]);
leftPtr++;
}
else
{
sorted.Add(right[rightPtr]);
rightPtr++;
}
}
return sorted;
}
This code, will use the default IComparer<T>
unless you pass your own.
As you can see this code self iterates on each half of the unsorted array, I've added some code using the Task
TPL class to run those calls asynchronously on seperate threads.
You could use the code like this,
var strings = new List<string>
{
"cixymn",
"adfxij",
"adxhxy",
"abcdef",
"iejfyq",
"uqbzxo",
"aaaaaa"
};
var sortedStrings = strings.MergeSort();
If the default string comparer is not lexicographical enough for you, you could instantiate and pass your a selected StringComparer
, perhaps like this,
var sortedStrings = strings.MergeSort(StringComparer.OrdinalIgnoreCase);
In the unlikely event that none of the StringComparer
s meet your requirements, you could
write your own implementation of IComparer<string>
and pass that to the MergeSort
function instead.
In any case, it makes sense to keep the merge sort generic and resuable for all types and pass the specialization into the function.
Upvotes: 1