RDScience
RDScience

Reputation: 33

faster algorithm or technique for building sparse arrays in c#

I have a matrix-building problem. To build the matrix (for a 3rd party package), I need to do it row-by-row by passing a double[] array to the 3rd-party object. Here's my problem: I have a list of objects that represent paths on a graph. Each object is a path with a 'source' property (string) and a 'destination' property (also string). I need to build a 1-dimensional array where all the elements are 0 except where the source property is equal to a given name. The given name will occur multiple times in the path list. Here's my function for building the sparse array:

    static double[] GetNodeSrcRow3(string nodeName)
    {
        double[] r = new double[cpaths.Count ];
        for (int i = 1; i < cpaths.Count; i++)
        {
            if (cpaths[i].src == nodeName) r[i] = 1;
        }
        return r;
    }

Now I need to call this function about 200k times with different names. The function itself takes between 0.05 and 0.1 seconds (timed with Stopwatch). As you can imagine, if we take the best possible case of 0.05 seconds * 200k calls = 10,000 seconds = 2.7 hours which is too long. The object 'cpaths' contains about 200k objects.

Can someone think of a way to accomplish this in a faster way?

Upvotes: 3

Views: 923

Answers (4)

Rusty Nail
Rusty Nail

Reputation: 2710

Fantastic Answers!

If I may add some, to the already great examples:

System.Numerics.Tensors.SparseTensor<double> GetNodeSrcRow3(string text)
{

    // A quick NuGet System.Numerics.Tensors Install:
    System.Numerics.Tensors.SparseTensor<double> SparseTensor = new System.Numerics.Tensors.SparseTensor<double>(new int[] { cpaths.Count }, true, 1);

    Parallel.For(1, cpaths.Count, (i, state) =>
    {
        if (cpaths[i].src == nodeName) SparseTensor[i] = 1.0D;
    });

    return SparseTensor;
}

System.Numerics is optimised hugely, also uses hardware acceleration. It is also Threadsafe. At least from what I have read about it.

For Speed and scalability, a small bit of code that could make all the difference.

Upvotes: -1

M.kazem Akhgary
M.kazem Akhgary

Reputation: 19149

if cpaths is normal list then that's not suitable for your case. you need a dictionary of src to list of indexes. like Dictionary<string, List<int>>.

then you can fill sparse array with random access. I would also suggest you to use Sparse list implementation for efficient memory usage rather than using memory inefficient double[]. a good implementation is SparseAList. (written by David Piepgrass)

Before generating your sparse lists, you should convert your cpaths list into a suitable dictionary, this step may take a little long (up to few seconds), but after that you will generate your sparse lists super fast.

public static Dictionary<string, List<int>> _dictionary;

public static void CacheIndexes()
{
    _dictionary = cpaths.Select((x, i) => new { index = i, value = x })
                        .GroupBy(x => x.value.src)
                        .ToDictionary(x => x.Key, x => x.Select(a => a.index).ToList());
}

you should call CacheIndexes before starting to generate your sparse arrays.

public static double[] GetNodeSrcRow3(string nodeName)
{
    double[] r = new double[cpaths.Count];
    List<int> indexes;
    if(!_dictionary.TryGetValue(nodeName, out indexes)) return r;

    foreach(var index in indexes) r[index] = 1;

    return r;
}

Note that if you use SparseAList it will occupy very small space. for example if double array is 10K length and has only one index set in it, with SparseAList you will have virtually 10K items, but in fact there is only one item stored in memory. its not hard to use that collection, I suggest you to give it a try.

same code using SparseAList

public static SparseAList<double> GetNodeSrcRow3(string nodeName)
{
    SparseAList<double> r = new SparseAList<double>();

    r.InsertSpace(0, cpaths.Count); // allocates zero memory.

    List<int> indexes;
    if(!_dictionary.TryGetValue(nodeName, out indexes)) return r;

    foreach(var index in indexes) r[index] = 1;

    return r;
}

Upvotes: 4

recursive
recursive

Reputation: 86084

I can't see the rest of your code, but I suspect most of the time is spent allocating and garbage collecting all the arrays. Assuming the size of cpaths doesn't change, you can reuse the same array.

private static double[] NodeSourceRow == null;
private static List<int> LastSetIndices = new List<int>();

static double[] GetNodeSrcRow3(string nodeName) {
    // create new array *only* on the first call
    NodeSourceRow = NodeSourceRow ?? new double[cpaths.Count];

    // reset all elements to 0
    foreach(int i in LastSetIndices) NodeSourceRow[i] = 0;
    LastSetIndices.Clear();

    // set the 1s
    for (int i = 1; i < cpaths.Count; i++) {
        if (cpaths[i].src == nodeName) {
            NodeSourceRow[i] = 1;
            LastSetIndices.Add(i);
        }
    }

    // tada!!
    return NodeSourceRow;
}

One drawback potential drawback would be if you need all the arrays to used at the same time, they will always have identical contents. But if you only use one at a time, this should be much faster.

Upvotes: 4

Bradley Uffner
Bradley Uffner

Reputation: 16991

You could make use of multi-threading using the TPL's Parallel.For method.

static double[] GetNodeSrcRow3(string nodeName)
{
    double[] r = new double[cpaths.Count];
    Parallel.For(1, cpaths.Count, (i, state) =>
        {
            if (cpaths[i].src == nodeName) r[i] = 1;
        });
    return r;
}

Upvotes: 2

Related Questions