Reputation: 1278
I added multithreading part to my code .
public class ThreadClassSeqGroups
{
public Dictionary<string, string> seqGroup;
public Dictionary<string, List<SearchAlgorithm.CandidateStr>> completeModels;
public Dictionary<string, List<SearchAlgorithm.CandidateStr>> partialModels;
private Thread nativeThread;
public ThreadClassSeqGroups(Dictionary<string, string> seqs)
{
seqGroup = seqs;
completeModels = new Dictionary<string, List<SearchAlgorithm.CandidateStr>>();
partialModels = new Dictionary<string, List<SearchAlgorithm.CandidateStr>>();
}
public void Run(DescrStrDetail dsd, DescrStrDetail.SortUnit primarySeedSu,
List<ushort> secondarySeedOrder, double partialCutoff)
{
nativeThread = new Thread(() => this._run(dsd, primarySeedSu, secondarySeedOrder, partialCutoff));
nativeThread.Priority = ThreadPriority.Highest;
nativeThread.Start();
}
public void _run(DescrStrDetail dsd, DescrStrDetail.SortUnit primarySeedSu,
List<ushort> secondarySeedOrder, double partialCutoff)
{
int groupSize = this.seqGroup.Count;
int seqCount = 0;
foreach (KeyValuePair<string, string> p in seqGroup)
{
Console.WriteLine("ThreadID {0} (priority:{1}):\t#{2}/{3} SeqName: {4}",
nativeThread.ManagedThreadId, nativeThread.Priority.ToString(), ++seqCount, groupSize, p.Key);
List<SearchAlgorithm.CandidateStr> tmpCompleteModels, tmpPartialModels;
SearchAlgorithm.SearchInBothDirections(
p.Value.ToUpper().Replace('T', 'U'), dsd, primarySeedSu, secondarySeedOrder, partialCutoff,
out tmpCompleteModels, out tmpPartialModels);
completeModels.Add(p.Key, tmpCompleteModels);
partialModels.Add(p.Key, tmpPartialModels);
}
}
public void Join()
{
nativeThread.Join();
}
}
class Program
{
public static int _paramSeqGroupSize = 2000;
static void Main(Dictionary<string, string> rawSeqs)
{
// Split the whole rawSeqs (Dict<name, seq>) into several groups
Dictionary<string, string>[] rawSeqGroups = SplitSeqFasta(rawSeqs, _paramSeqGroupSize);
// Create a thread for each seqGroup and run
var threadSeqGroups = new MultiThreading.ThreadClassSeqGroups[rawSeqGroups.Length];
for (int i = 0; i < rawSeqGroups.Length; i++)
{
threadSeqGroups[i] = new MultiThreading.ThreadClassSeqGroups(rawSeqGroups[i]);
//threadSeqGroups[i].SetPriority();
threadSeqGroups[i].Run(dsd, primarySeedSu, secondarySeedOrder, _paramPartialCutoff);
}
// Merge results from threads after the thread finish
var allCompleteModels = new Dictionary<string, List<SearchAlgorithm.CandidateStr>>();
var allPartialModels = new Dictionary<string, List<SearchAlgorithm.CandidateStr>>();
foreach (MultiThreading.ThreadClassSeqGroups t in threadSeqGroups)
{
t.Join();
foreach (string name in t.completeModels.Keys)
{
allCompleteModels.Add(name, t.completeModels[name]);
}
foreach (string name in t.partialModels.Keys)
{
allPartialModels.Add(name, t.partialModels[name]);
}
}
}
}
However, the speed with multiple threads is much slower than single thread, and the CPU load is generally <10%.
For example:
The input file contain 2500 strings
_paramGroupSize = 3000, main thread + 1 calculation thread cost 200 sec
_paramGroupSize = 400, main thread + 7 calculation threads cost much more time (I killed it after over 10 mins run).
Is there any problem with my implementation? How to speed it up?
Thanks.
Upvotes: 0
Views: 138
Reputation: 62439
It seems to me that you are trying to process a file in parallel with multiple threads. This is a bad idea, assuming you have a single mechanical disk.
Basically, the head of the disk needs to seek the next reading location for each read request. This is a costly operation and since multiple threads issue read commands it means the head gets bounced around as each thread gets its turn to run. This will drastically reduce performance compared to the case where a single thread is doing the reading.
Upvotes: 3
Reputation: 35881
When threads are run they are given time on a specific processor. if there are more threads than processors, the system context switches between threads to get all active threads some time to process. Context switching is really expensive. If you have more threads than processors most of the CPU time can be take up by context switching and make a single-threaded solution look faster than a multi thread solution.
Your example shows starting an indeterminate number of threads. if SplitSeqFasta
returns more entries than cores, you will create more threads and cores and introduce a lot of context switching.
I suggest you throttle the number of threads manually, or use something like the thread parallel library and the Parallel class to have it automatically throttle for you.
Upvotes: 0
Reputation: 34198
What was the code prior to multithreading? It's hard to tell what this code is doing, and much of the "working" code seems to be hidden in your search algorithm. However, some thoughts:
DescrStrDetail
which is passed from variable dsd
in your Main
method to every child thread. However, the declaration of this variable is missing and so its usage/implementation is unknown. If this variable has locks that prevent multiple threads accessing at the same time, then your multiple threads will potentially be locking eachother out of this data, further slowing performance.Upvotes: 0