Reputation: 2533
I am making a list of unique "set of 3 strings" from some data, in a way that if the 3 strings come together they become a set, and I can only have unique sets in my list.
And I keep adding sets to the list if they do not exist in the list already, so that if I encounter these three strings together {A,B,C} I wont put it in the list again. So I have 2 questions. And the answer to second one actually depends on the answer of the first one.
I am using C# by the way.
Upvotes: 1
Views: 813
Reputation: 2837
You can inherit from List<String>
and override Equals()
and GetHashCode()
methods:
public class StringList : List<String>
{
public override bool Equals(object obj)
{
StringList other = obj as StringList;
if (other == null) return false;
return this.All(x => other.Contains(x));
}
public override int GetHashCode()
{
unchecked
{
int hash = 19;
foreach (String s in this)
{
hash = hash + s.GetHashCode() * 31;
}
return hash;
}
}
}
Now, you can use HashSet<StringList>
to store only unique sets
Upvotes: 0
Reputation: 4556
here is a simple string-wrapper for you:
/// The wrapper for three strings
public class StringTriplet
{
private List<string> Store;
// accessors to three source strings:
public string A { get; private set; }
public string B { get; private set; }
public string C { get; private set; }
// constructor (need to feel internal storage)
public StringTriplet(string a, string b, string c)
{
this.Store = new List<string>();
this.Store.Add(a);
this.Store.Add(b);
this.Store.Add(c);
// sort is reqiured, cause later we don't want to compare all strings each other
this.Store.Sort();
this.A = a;
this.B = b;
this.C = c;
}
// additional method. you could add IComparable declaration to the entire class, but it is not necessary in your task...
public int CompareTo(StringTriplet obj)
{
if (null == obj)
return -1;
int cmp;
cmp = this.Store.Count.CompareTo(obj.Store.Count);
if (0 != cmp)
return cmp;
for (int i = 0; i < this.Store.Count; i++)
{
if (null == this.Store[i])
return 1;
cmp = this.Store[i].CompareTo(obj.Store[i]);
if ( 0 != cmp )
return cmp;
}
return 0;
}
// additional method. it is a good practice : override both 'Equals' and 'GetHashCode'. See below..
override public bool Equals(object obj)
{
if (! (obj is StringTriplet))
return false;
var t = obj as StringTriplet;
return ( 0 == this.CompareTo(t));
}
// necessary method . it will be implicitly used on adding values to the HashSet
public override int GetHashCode()
{
int res = 0;
for (int i = 0; i < this.Store.Count; i++)
res = res ^ (null == this.Store[i] ? 0 : this.Store[i].GetHashCode()) ^ i;
return res;
}
}
Now you could just create hashset and add values:
var t = new HashSet<StringTriplet> ();
t.Add (new StringTriplet ("a", "b", "c"));
t.Add (new StringTriplet ("a", "b1", "c"));
t.Add (new StringTriplet ("a", "b", "c")); // dup
t.Add (new StringTriplet ("a", "c", "b")); // dup
t.Add (new StringTriplet ("1", "2", "3"));
t.Add (new StringTriplet ("1", "2", "4"));
t.Add (new StringTriplet ("3", "2", "1"));
foreach (var s in t) {
Console.WriteLine (s.A + " " + s.B + " " + s.C);
}
return 0;
Upvotes: 0
Reputation: 5314
Probably the best way is to use a HashSet, if you don't need to have duplicate elements in your sets. It sounds like each set of 3 has 3 unique elements; if that is actually the case, I would combine a HashSet approach with the concatenation that you already worked out, i.e. order the elements, combine with some separator, and then add the concatenated elements to a HashSet which will prevent duplicates from ever occuring in the first place.
If your sets of three could have duplicate elements, then Kevin's approach is what you're going to have to do for each. You might get some better performance from using a list of HashSets for each set of three, but with only three elements the overhead of creating a hash for each element of potentially millions of sets seems like it would perform worse then just iterating over them once.
Upvotes: 1
Reputation: 551
Either an array or a list is your best bet for storing the data, since as wentimo mentioned in a comment, concatenating them means that you are losing data that you may need. To steal his example, "ab" "cd "ef" concatenated together is the same as "abcd" "e" and "f" concatenated, but shouldn't be treated as equivalent sets.
To compare them, I would order the list alphabetically, then compare each value in order. That takes care of the fact that the order of the values doesn't matter. A pseudocode example might look like this:
Compare(List<string> a, List<string> b)
{
a.Sort();
b.Sort();
if(a.Length == b.Length)
{
for(int i = 0; i < a.Length; i++)
{
if(a[i] != b[i])
{
return false;
}
}
return true;
}
else
{
return false;
}
}
Now that you stated in a comment that performance is an imporatant consideration since you may have millions of these sets to compare and that you won't have duplicate elements in a set, here is a more optimized version of my code, note that I no longer have to sort the two lists, which will save quite a bit of time in executing this function.
Compare(List<string> a, List<string> b)
{
if(a.Length == b.Length)
{
for(int i = 0; i < a.Length; i++)
{
if(!b.Contains(a[i]))
{
return false;
}
}
return true;
}
else
{
return false;
}
}
DrewJordan's approach of using a hashtable is still probably than my approach, since it just has to sort each set of three and then can do the comparison to your existing sets much faster than my approach can.
Upvotes: 3