Reputation: 51

best data structure for storing large number of numeric fields

I am working with a class, say Widget, that has a large number of numeric real world attributes (eg, height, length, weight, cost, etc.). There are different types of widgets (sprockets, cogs, etc.), but each widget shares the exact same attributes (the values will be different by widget, of course, but they all have a weight, weight, etc.). I have 1,000s of each type of widget (1,000 cogs, 1,000 sprockets, etc.)

I need to perform a lot of calculations on these attributes (say calculating the weighted average of the attributes for 1000s of different widgets). For the weighted averages, I have different weights for each widget type (ie, I may care more about length for sprockets than for cogs).

Right now, I am storing all the attributes in a Dictionary< string, double> within each widget (the widgets have an enum that specifies their type: cog, sprocket, etc.). I then have some calculator classes that store weights for each attribute as a Dictionary< WidgetType, Dictionary< string, double >>. To calculate the weighted average for each widget, I simply iterate through its attribute dictionary keys like:

double weightedAvg = 0.0;
foreach (string attibuteName in widget.Attributes.Keys)
{
    double attributeValue = widget.Attributes[attributeName];
    double attributeWeight = calculator.Weights[widget.Type][attributeName];
    weightedAvg += (attributeValue * attributeWeight);
}

So this works fine and is pretty readable and easy to maintain, but is very slow for 1000s of widgets based on some profiling. My universe of attribute names is known and will not change during the life of the application, so I am wondering what some better options are. The few I can think of:

1) Store attribute values and weights in double []s. I think this is probably the most efficient option, but then I need to make sure the arrays are always stored in the correct order between widgets and calculators. This also decouples the data from the metadata so I will need to store an array (?) somewhere that maps between the attribute names and the index into double [] of attribute values and weights.

2) Store attribute values and weights in immutable structs. I like this option because I don't have to worry about the ordering and the data is "self documenting". But is there an easy way to loop over these attributes in code? I have almost 100 attributes, so I don't want to hardcode all those in the code. I can use reflection, but I worry that this will cause even a larger penalty hit since I am looping over so many widgets and will have to use reflection on each one.

Any other alternatives?

Upvotes: 4

Answers (4)

user1719287

Reputation: 61

To calculate weighted averages without looping or reflection, one way would be to calculate the weighted average of the individual attributes and store them in some place. This should happen while you are creating instance of the widget. Following is a sample code which needs to be modified to your needs. Also, for further processing of the the widgets themselves, you can use data parallelism. see my other response in this thread.

public enum WidgetType { }

public class Claculator { }

public class WeightStore
{
    static Dictionary<int, double> widgetWeightedAvg = new Dictionary<int, double>();
    public static void AttWeightedAvgAvailable(double attwightedAvg, int widgetid)
    {
        if (widgetWeightedAvg.Keys.Contains(widgetid))
            widgetWeightedAvg[widgetid] += attwightedAvg;
        else
            widgetWeightedAvg[widgetid] = attwightedAvg;
    }
}

public class WidgetAttribute
{
    public string Name { get; }
    public double Value { get; }
    public WidgetAttribute(string name, double value, WidgetType type, int widgetId)
    {
        Name = name;
        Value = value;
        double attWeight = Calculator.Weights[type][name];
        WeightStore.AttWeightedAvgAvailable(Value*attWeight, widgetId);
    }
}

public class CogWdiget
{
    public int Id { get; }
    public WidgetAttribute height { get; set; }
    public WidgetAttribute wight { get; set; }
}

public class Client
{
    public void BuildCogWidgets()
    {
        CogWdiget widget = new CogWdiget();
        widget.Id = 1;
        widget.height = new WidgetAttribute("height", 12.22, 1);
    }
}

Upvotes: 1

user1719287

Reputation: 61

Use Data Parallelism supported by the .net 4 and above.

https://msdn.microsoft.com/en-us/library/dd537608(v=vs.110).aspx

An excerpt from the above link

When a parallel loop runs, the TPL partitions the data source so that the loop can operate on multiple parts concurrently. Behind the scenes, the Task Scheduler partitions the task based on system resources and workload. When possible, the scheduler redistributes work among multiple threads and processors if the workload becomes unbalanced

Upvotes: 0

Jim Mischel

Reputation: 134015

Three possibilities come immediately to mind. The first, which I think you rejected too readily, is to have individual fields in your class. That is, individual double values named height, length, weight, cost, etc. You're right that it would be more code to do the calculations, but you wouldn't have the indirection of dictionary lookup.

Second is to ditch the dictionary in favor of an array. So rather than a Dictionary<string, double>, you'd just have a double[]. Again, I think you rejected this too quickly. You can easily replace the string dictionary keys with an enumeration. So you'd have:

enum WidgetProperty
{
    First = 0,
    Height = 0,
    Length = 1,
    Weight = 2,
    Cost = 3,
    ...
    Last = 100
}

Given that and an array of double, you can easily go through all of the values for each instance:

for (int i = (int)WidgetProperty.First; i < (int)WidgetProperty.Last; ++i)
{
    double attributeValue = widget.Attributes[i];
    double attributeWeight = calculator.Weights[widget.Type][i];
    weightedAvg += (attributeValue * attributeWeight);
}

Direct array access is going to be significantly faster than accessing a dictionary by string.

Finally, you can optimize your dictionary access a little bit. Rather than doing a foreach on the keys and then doing a dictionary lookup, do a foreach on the dictionary itself:

foreach (KeyValuePair<string, double> kvp in widget.Attributes)
{
    double attributeValue = kvp.Value;
    double attributeWeight = calculator.Weights[widget.Type][kvp.Key];
    weightedAvg += (attributeValue * attributeWeight);
}

Upvotes: 4

Norbert

Reputation: 6084

As it is always the case with data normalization, is that choosing your normalization level determines a good part of the performance. It looks like you would have to go from your current model to another model or a mix.

Better performance for your scenario is possible when you do not process this with the C# side, but with the database instead. You then get the benefit of indexes, no data transfer except the wanted result, plus 100000s of man hours already spent on performance optimization.

Upvotes: 0

best data structure for storing large number of numeric fields

Answers (4)

Related Questions