Reputation: 33
I have a matrix-building problem. To build the matrix (for a 3rd party package), I need to do it row-by-row by passing a double[] array to the 3rd-party object. Here's my problem: I have a list of objects that represent paths on a graph. Each object is a path with a 'source' property (string) and a 'destination' property (also string). I need to build a 1-dimensional array where all the elements are 0 except where the source property is equal to a given name. The given name will occur multiple times in the path list. Here's my function for building the sparse array:
static double[] GetNodeSrcRow3(string nodeName)
{
double[] r = new double[cpaths.Count ];
for (int i = 1; i < cpaths.Count; i++)
{
if (cpaths[i].src == nodeName) r[i] = 1;
}
return r;
}
Now I need to call this function about 200k times with different names. The function itself takes between 0.05 and 0.1 seconds (timed with Stopwatch). As you can imagine, if we take the best possible case of 0.05 seconds * 200k calls = 10,000 seconds = 2.7 hours which is too long. The object 'cpaths' contains about 200k objects.
Can someone think of a way to accomplish this in a faster way?
Upvotes: 3
Views: 923
Reputation: 2710
Fantastic Answers!
If I may add some, to the already great examples:
System.Numerics.Tensors.SparseTensor<double> GetNodeSrcRow3(string text)
{
// A quick NuGet System.Numerics.Tensors Install:
System.Numerics.Tensors.SparseTensor<double> SparseTensor = new System.Numerics.Tensors.SparseTensor<double>(new int[] { cpaths.Count }, true, 1);
Parallel.For(1, cpaths.Count, (i, state) =>
{
if (cpaths[i].src == nodeName) SparseTensor[i] = 1.0D;
});
return SparseTensor;
}
System.Numerics is optimised hugely, also uses hardware acceleration. It is also Threadsafe. At least from what I have read about it.
For Speed and scalability, a small bit of code that could make all the difference.
Upvotes: -1
Reputation: 19149
if cpaths is normal list then that's not suitable for your case. you need a dictionary of src
to list of indexes. like Dictionary<string, List<int>>
.
then you can fill sparse array with random access. I would also suggest you to use Sparse list implementation for efficient memory usage rather than using memory inefficient double[]
. a good implementation is SparseAList. (written by David Piepgrass)
Before generating your sparse lists, you should convert your cpaths
list into a suitable dictionary, this step may take a little long (up to few seconds), but after that you will generate your sparse lists super fast.
public static Dictionary<string, List<int>> _dictionary;
public static void CacheIndexes()
{
_dictionary = cpaths.Select((x, i) => new { index = i, value = x })
.GroupBy(x => x.value.src)
.ToDictionary(x => x.Key, x => x.Select(a => a.index).ToList());
}
you should call CacheIndexes
before starting to generate your sparse arrays.
public static double[] GetNodeSrcRow3(string nodeName)
{
double[] r = new double[cpaths.Count];
List<int> indexes;
if(!_dictionary.TryGetValue(nodeName, out indexes)) return r;
foreach(var index in indexes) r[index] = 1;
return r;
}
Note that if you use SparseAList
it will occupy very small space. for example if double array is 10K
length and has only one index set in it, with SparseAList
you will have virtually 10K
items, but in fact there is only one item stored in memory. its not hard to use that collection, I suggest you to give it a try.
same code using SparseAList
public static SparseAList<double> GetNodeSrcRow3(string nodeName)
{
SparseAList<double> r = new SparseAList<double>();
r.InsertSpace(0, cpaths.Count); // allocates zero memory.
List<int> indexes;
if(!_dictionary.TryGetValue(nodeName, out indexes)) return r;
foreach(var index in indexes) r[index] = 1;
return r;
}
Upvotes: 4
Reputation: 86084
I can't see the rest of your code, but I suspect most of the time is spent allocating and garbage collecting all the arrays. Assuming the size of cpaths
doesn't change, you can reuse the same array.
private static double[] NodeSourceRow == null;
private static List<int> LastSetIndices = new List<int>();
static double[] GetNodeSrcRow3(string nodeName) {
// create new array *only* on the first call
NodeSourceRow = NodeSourceRow ?? new double[cpaths.Count];
// reset all elements to 0
foreach(int i in LastSetIndices) NodeSourceRow[i] = 0;
LastSetIndices.Clear();
// set the 1s
for (int i = 1; i < cpaths.Count; i++) {
if (cpaths[i].src == nodeName) {
NodeSourceRow[i] = 1;
LastSetIndices.Add(i);
}
}
// tada!!
return NodeSourceRow;
}
One drawback potential drawback would be if you need all the arrays to used at the same time, they will always have identical contents. But if you only use one at a time, this should be much faster.
Upvotes: 4
Reputation: 16991
You could make use of multi-threading using the TPL's Parallel.For
method.
static double[] GetNodeSrcRow3(string nodeName)
{
double[] r = new double[cpaths.Count];
Parallel.For(1, cpaths.Count, (i, state) =>
{
if (cpaths[i].src == nodeName) r[i] = 1;
});
return r;
}
Upvotes: 2