Reputation: 10191
I'm looking for a data structure similar to a dictionary that returns the set of all related items to a key.
For example, I would use it like this:
var data = new FancyDataStructure();
data.Add(new string[] {"Elizabeth", "Liz", "Betty"});
data.Add(new string[] {"Bob", "Robert", "Rob"});
string[] alternateNames1 = data["Betty"];
string[] alternateNames2 = data["Liz"]
In this instance, alternateNames1 would be an array containing "Liz" and "Elizabeth", and alternateNames2 would be an array containing "Elizabeth" and "Betty."
I don't want to reinvent this, but I couldn't find any examples of such a structure.
Update
Thank you to those that have written back with suggestions. Many people have suggested using some version of Dictionary<string, IEnumerable<string>>
. Currently I am using this approach, but it doesn't actually fulfill the requirement without being horribly difficult to maintain. Every value in every list needs to be able to function as a key to every other value ever added to it in a set.
Thus, given the following:
data.Add(new string[] {"Elizabeth", "Liz"}
data.Add(new string[] {"Liz", "Betty"}
alternates = data["Betty"];
I would expect alternates to now contain "Elizabeth," and "Liz."
It looks as though I might just have to build such a structure to suit my needs. Keep the ideas coming though!
Brian
Upvotes: 8
Views: 405
Reputation: 4762
Your problem sounds like it is really a graphing problem. Think of the names as nodes and membership in the set as the edges. From this standpoint, you would want a data structure that handles sparse graphs well, such as an adjacency list. This is, of course, similary to what you are already doing with a Dictionary<string, IEnumerable<string>>
but thinking about it in this way might lead you to some helpful implementations and algorithms.
Upvotes: 1
Reputation: 713
I wrote some code, I don't know how efficient it, but it i think that it does what you want.
It is your structure
class FancyDataStructure
{
private IDictionary<string, HashSet<string>> dictionary
= new Dictionary<string, HashSet<string>>();
public void Add(params string[] names)
{
HashSet<string> set = new HashSet<string>(names);
for (int i = 0; i < names.Length; i++)
{
if (!dictionary.ContainsKey(names[i]))
{
dictionary.Add(names[i], set);
}
else
{
HashSet<string> union =
new HashSet<string>(set.Union<string>(dictionary[names[i]]));
set = union;
foreach (string oldName in dictionary[names[i]])
{
dictionary[oldName] = union;
}
for (int j = 0; j < i; j++)
{
if (!dictionary.ContainsKey(names[j]))
{
dictionary.Add(names[j], union);
}
}
}
}
}
public string[] this[string key]
{
get
{
List<string> result = dictionary[key].ToList<string>();
result.Remove(key);
return result.ToArray();
}
}
}
and you can use it, like this
static void Main(string[] args)
{
FancyDataStructure data = new FancyDataStructure();
data.Add("Elizabeth", "Liz");
data.Add("Liz", "Betty");
string[] alternates = data["Betty"];
foreach (var item in alternates)
{
Console.WriteLine(item);
}
}
Upvotes: 0
Reputation: 39277
Or, since List is a reference type you could do the following ...
Dictionary<string, List<string>> dict = new ...
Proceed as follows:-
To add a single association (a = b) {decomposed from a list of equivalences}
Lookup a and b in the Dictionary
If neither exists
dict.Add(a, new List<string>(){a}); dict.Add(b, new List<string>(){b});
If one exists, say, a
var list = dict[a];
list.Add(b);
dict.Add(b, list);
If both exist and the lists are the same (object compare) you are done.
If both exists and the lists are different:
var list1 = dict[a];
var list2 = dict[b];
list1.AddRange(list2);
dict.Remove(b);
dict.Add(b, list1);
Upvotes: 0
Reputation: 39277
How about a pair of data structures: Dictionary<string, Guid>
and Dictionary<Guid, List<string>>
To add a pair of keys (a, b) [you can decompose a larger add into pairs (1+2, 2+3, ...] proceed as follows:-
Lookup a and b in the first dictionary.
If neither exists, create a new Guid and add (a,g) and (b,g) to first dictionary and (g,List{a}) and (g,List{b}) to second dictionary.
If one exists, say a, grab the guid from it (g) and add the other (b, g) to the first dictionary and tack b onto the end of the list found at [g] in the second dictionary.
If both exist AND they have the same guid - nothing to do.
If both exist and they have different guids you need to merge the two sets // This is something most of the other proposed solutions seem to be missing // so pick a Guid to eliminate, go get it from the second dictionary, add the list of strings to the other entry and then remove this entry. Finally mark all the words in the first dictionary that were in that list.
To get all the related words, lookup the Guid in the first dictionary and grab the list from the second dictionary.
Of course a static incrementing long value would probably work better than a Guid.
Upvotes: 0
Reputation: 19609
The de facto alt.net standard is in Iesi.Collections, but the base class library only has HashSet<T>
in dotnet 3.5 or above.
I've used "group by" like clauses in linq to easily remove duplicates from arbitrary IEnumerable<T>
collections, but that doesn't quite give you set semantics.
HashSet<> is close to what you want.
Based on your requirements, I don't think there's something off the shelf that would map strings to pre-existing collections; basically, you'd have to write a class that takes a method like StoreAssociations<<T>>(IEnumerable<<T>> names)
, converts IEnumerable to HashSet, and iterates over each item in the HashSet to add a mapping in an IDictionary<string,HashSet<T>>
to the newly-created hashset.
Upvotes: 0
Reputation: 267000
Try using a dictionary, something like:
Dictionary<string, List<string>>
So a dictionary of string keys with values of List
Upvotes: -1
Reputation: 81516
You basically have a dictionary where multiple keys map to the same value. There's no built-in data structure which supports the operation you want, but its easy to represent as a Dictionary{string, HashSet{string}}
in .NET:
static void AddNames(Dictionary<string, HashSet<string>> map, params string[] names)
{
for (int i = 0; i < names.Length; i++)
{
HashSet<string> value;
if (!map.TryGetValue(names[i], out value))
{
value = new HashSet<string>();
map.Add(names[i], value);
}
for (int j = 0; j < names.Length; j++)
{
value.Add(names[j]);
}
}
}
static void Main(string[] args)
{
Dictionary<string, HashSet<string>> names = new Dictionary<string,HashSet<string>>();
AddNames(names, "Chris", "Christopher");
AddNames(names, "Christina", "Chrissy", "Chris");
HashSet<string> relatedToChris = names["Chris"]; // gets "Chris", "Christina", "Chrissy", "Christopher";
HashSet<string> namesRelatedToChristinia = names["Christina"]; // gets "Christina", "Chrissy", "Chris";
}
You can think of your datastructure as a directed graph where each node has an edge connected to its related name. Since there are n^2 edges, the dictionary requires O(n^2) time to insert and memory. Its not possible to reduce the lookup time to anything better.
Fortunately, since its implemented as a dictionary, lookups as still O(1). Deletes are O(m) where m is the number of values related to a key.
Upvotes: 0
Reputation: 505
Just a thought in another direction - strongly typed datasets seem to have a lot going for them. And serialized as byte arrays they are pretty fast for moving multidimensionally structured data around.
Iteration and Linq capability are sort of built in.
Maybe overkill for a lot of stuff, but I have a number of places where I stored the whole dataset in one varbinary(max) columnn in SQL.
Upvotes: 1
Reputation: 78848
I would just use the type Dictionary<string, IEnumerable<string>>
. To build this structure from a list of lists, you could have code like this:
var alternateNames = new string[][] {
new string[] { "Elizabeth", "Liz", "Betty" },
new string[] { "Bob", "Robert", "Rob" }, };
var altNameLookup =
(
from nameList in alternateNames
from name in nameList
select new {
Name = name, NameList = nameList.Except(new string[] { name } ) }
).ToDictionary(o => o.Name, o => o.NameList);
Upvotes: 0
Reputation: 9892
System.Collections.Generic namespace and the System.Collections are loaded with KeyValue pair dictionaries, sorted dictionaries, List Objects and much more.
System.Collections.Generic.Dictionary<int, string> dic = new Dictionary<int, string>();
dic.Add(1, test);
or a nested list inside a dictionary
Dictionary<string, List<string>> dic = new Dictionary<string, List<string>>();
List<string> alternatives = new List<string>();
alternatives.Add("Brenda");
dic.Add("Betty", alternatives);
Upvotes: 1
Reputation: 7277
I use this:
It has a generic Set<a> type and implements all of the lovely iterators, .Contains, .Count, etc.
Upvotes: -1
Reputation: 28834
Something like this seems simple enough.
var data = new List<string[]>();
data.Add(new string[] {"Elizabeth", "Liz", "Betty"});
data.Add(new string[] {"Bob", "Robert", "Rob"});
var alternateNames1 = data.Where(x =>x.Contains("Betty")).Select(x => x.Where(y => y != "Betty"));
Upvotes: 0