Brian Vallelunga
Brian Vallelunga

Reputation: 10191

Is there a data structure that holds sets of data in .NET?

I'm looking for a data structure similar to a dictionary that returns the set of all related items to a key.

For example, I would use it like this:

var data = new FancyDataStructure();

data.Add(new string[] {"Elizabeth", "Liz", "Betty"});
data.Add(new string[] {"Bob", "Robert", "Rob"});

string[] alternateNames1 = data["Betty"];
string[] alternateNames2 = data["Liz"]

In this instance, alternateNames1 would be an array containing "Liz" and "Elizabeth", and alternateNames2 would be an array containing "Elizabeth" and "Betty."

I don't want to reinvent this, but I couldn't find any examples of such a structure.

Update

Thank you to those that have written back with suggestions. Many people have suggested using some version of Dictionary<string, IEnumerable<string>>. Currently I am using this approach, but it doesn't actually fulfill the requirement without being horribly difficult to maintain. Every value in every list needs to be able to function as a key to every other value ever added to it in a set.

Thus, given the following:

data.Add(new string[] {"Elizabeth", "Liz"}
data.Add(new string[] {"Liz", "Betty"}
alternates = data["Betty"];

I would expect alternates to now contain "Elizabeth," and "Liz."

It looks as though I might just have to build such a structure to suit my needs. Keep the ideas coming though!

Brian

Upvotes: 8

Views: 405

Answers (12)

Dolphin
Dolphin

Reputation: 4762

Your problem sounds like it is really a graphing problem. Think of the names as nodes and membership in the set as the edges. From this standpoint, you would want a data structure that handles sparse graphs well, such as an adjacency list. This is, of course, similary to what you are already doing with a Dictionary<string, IEnumerable<string>> but thinking about it in this way might lead you to some helpful implementations and algorithms.

Upvotes: 1

user179437
user179437

Reputation: 713

I wrote some code, I don't know how efficient it, but it i think that it does what you want.

It is your structure

class FancyDataStructure
{
    private IDictionary<string, HashSet<string>> dictionary 
        = new Dictionary<string, HashSet<string>>();

    public void Add(params string[] names)
    {
        HashSet<string> set = new HashSet<string>(names);
        for (int i = 0; i < names.Length; i++)
        {
            if (!dictionary.ContainsKey(names[i]))
            {
                dictionary.Add(names[i], set);
            }
            else
            {
                HashSet<string> union = 
                new HashSet<string>(set.Union<string>(dictionary[names[i]]));
                set = union;
                foreach (string oldName in dictionary[names[i]])
                {
                    dictionary[oldName] = union;
                }
                for (int j = 0; j < i; j++)
                {
                    if (!dictionary.ContainsKey(names[j]))
                    {
                        dictionary.Add(names[j], union);
                    }
                }
            }
        }
    }

    public string[] this[string key]
    {
        get
        {
            List<string> result = dictionary[key].ToList<string>();
            result.Remove(key);
            return result.ToArray();
        }
    }
}

and you can use it, like this

    static void Main(string[] args)
    {

        FancyDataStructure data = new FancyDataStructure();

        data.Add("Elizabeth", "Liz");
        data.Add("Liz", "Betty");

        string[] alternates = data["Betty"];
        foreach (var item in alternates)
        {
            Console.WriteLine(item);
        }
    }

Upvotes: 0

Ian Mercer
Ian Mercer

Reputation: 39277

Or, since List is a reference type you could do the following ...

Dictionary<string, List<string>> dict = new ...

Proceed as follows:-

To add a single association (a = b) {decomposed from a list of equivalences}

Lookup a and b in the Dictionary

If neither exists

dict.Add(a, new List<string>(){a}); dict.Add(b, new List<string>(){b});

If one exists, say, a

var list = dict[a];
list.Add(b);
dict.Add(b, list);

If both exist and the lists are the same (object compare) you are done.

If both exists and the lists are different:

var list1 = dict[a];
var list2 = dict[b];
list1.AddRange(list2);
dict.Remove(b);
dict.Add(b, list1);

Upvotes: 0

Ian Mercer
Ian Mercer

Reputation: 39277

How about a pair of data structures: Dictionary<string, Guid>and Dictionary<Guid, List<string>>

To add a pair of keys (a, b) [you can decompose a larger add into pairs (1+2, 2+3, ...] proceed as follows:-

Lookup a and b in the first dictionary.
If neither exists, create a new Guid and add (a,g) and (b,g) to first dictionary and (g,List{a}) and (g,List{b}) to second dictionary.

If one exists, say a, grab the guid from it (g) and add the other (b, g) to the first dictionary and tack b onto the end of the list found at [g] in the second dictionary.

If both exist AND they have the same guid - nothing to do.

If both exist and they have different guids you need to merge the two sets // This is something most of the other proposed solutions seem to be missing // so pick a Guid to eliminate, go get it from the second dictionary, add the list of strings to the other entry and then remove this entry. Finally mark all the words in the first dictionary that were in that list.

To get all the related words, lookup the Guid in the first dictionary and grab the list from the second dictionary.

Of course a static incrementing long value would probably work better than a Guid.

Upvotes: 0

JasonTrue
JasonTrue

Reputation: 19609

The de facto alt.net standard is in Iesi.Collections, but the base class library only has HashSet<T> in dotnet 3.5 or above.

I've used "group by" like clauses in linq to easily remove duplicates from arbitrary IEnumerable<T> collections, but that doesn't quite give you set semantics.

HashSet<> is close to what you want.

Based on your requirements, I don't think there's something off the shelf that would map strings to pre-existing collections; basically, you'd have to write a class that takes a method like StoreAssociations<<T>>(IEnumerable<<T>> names), converts IEnumerable to HashSet, and iterates over each item in the HashSet to add a mapping in an IDictionary<string,HashSet<T>> to the newly-created hashset.

Upvotes: 0

Blankman
Blankman

Reputation: 267000

Try using a dictionary, something like:

Dictionary<string, List<string>>

So a dictionary of string keys with values of List

Upvotes: -1

Juliet
Juliet

Reputation: 81516

You basically have a dictionary where multiple keys map to the same value. There's no built-in data structure which supports the operation you want, but its easy to represent as a Dictionary{string, HashSet{string}} in .NET:

static void AddNames(Dictionary<string, HashSet<string>> map, params string[] names)
{
    for (int i = 0; i < names.Length; i++)
    {
        HashSet<string> value;
        if (!map.TryGetValue(names[i], out value))
        {
            value = new HashSet<string>();
            map.Add(names[i], value);
        }

        for (int j = 0; j < names.Length; j++)
        {
            value.Add(names[j]);
        }
    }
}

static void Main(string[] args)
{
    Dictionary<string, HashSet<string>> names = new Dictionary<string,HashSet<string>>();
    AddNames(names, "Chris", "Christopher");
    AddNames(names, "Christina", "Chrissy", "Chris");

    HashSet<string> relatedToChris = names["Chris"];                // gets "Chris", "Christina", "Chrissy", "Christopher";
    HashSet<string> namesRelatedToChristinia = names["Christina"];  // gets "Christina", "Chrissy", "Chris";
}

You can think of your datastructure as a directed graph where each node has an edge connected to its related name. Since there are n^2 edges, the dictionary requires O(n^2) time to insert and memory. Its not possible to reduce the lookup time to anything better.

Fortunately, since its implemented as a dictionary, lookups as still O(1). Deletes are O(m) where m is the number of values related to a key.

Upvotes: 0

Charles Hankey
Charles Hankey

Reputation: 505

Just a thought in another direction - strongly typed datasets seem to have a lot going for them. And serialized as byte arrays they are pretty fast for moving multidimensionally structured data around.

Iteration and Linq capability are sort of built in.

Maybe overkill for a lot of stuff, but I have a number of places where I stored the whole dataset in one varbinary(max) columnn in SQL.

Upvotes: 1

Jacob
Jacob

Reputation: 78848

I would just use the type Dictionary<string, IEnumerable<string>>. To build this structure from a list of lists, you could have code like this:

var alternateNames = new string[][] {
    new string[] { "Elizabeth", "Liz", "Betty" },
    new string[] { "Bob", "Robert", "Rob" }, };
var altNameLookup = 
    (
        from nameList in alternateNames
        from name in nameList
        select new { 
            Name = name, NameList = nameList.Except(new string[] { name } ) }
    ).ToDictionary(o => o.Name, o => o.NameList);

Upvotes: 0

Patrick Kafka
Patrick Kafka

Reputation: 9892

System.Collections.Generic namespace and the System.Collections are loaded with KeyValue pair dictionaries, sorted dictionaries, List Objects and much more.

System.Collections.Generic.Dictionary<int, string> dic = new Dictionary<int, string>();
        dic.Add(1, test);

or a nested list inside a dictionary

Dictionary<string, List<string>> dic = new Dictionary<string, List<string>>();
List<string> alternatives = new List<string>();
alternatives.Add("Brenda");
dic.Add("Betty", alternatives);

Upvotes: 1

Jared Updike
Jared Updike

Reputation: 7277

I use this:

It has a generic Set<a> type and implements all of the lovely iterators, .Contains, .Count, etc.

Upvotes: -1

Paul Creasey
Paul Creasey

Reputation: 28834

Something like this seems simple enough.

var data = new List<string[]>();

data.Add(new string[] {"Elizabeth", "Liz", "Betty"});
data.Add(new string[] {"Bob", "Robert", "Rob"});

var alternateNames1 = data.Where(x =>x.Contains("Betty")).Select(x => x.Where(y => y != "Betty"));

Upvotes: 0

Related Questions