Taha Rehman Siddiqui
Taha Rehman Siddiqui

Reputation: 2533

Compare a set of three strings with another

I am making a list of unique "set of 3 strings" from some data, in a way that if the 3 strings come together they become a set, and I can only have unique sets in my list.

  1. A,B,C
  2. B,C,D
  3. D,E,F and so on

And I keep adding sets to the list if they do not exist in the list already, so that if I encounter these three strings together {A,B,C} I wont put it in the list again. So I have 2 questions. And the answer to second one actually depends on the answer of the first one.

  1. How to store this set of 3 string, use List or array or concatenate them or anything else? (I may add it to a Dictionary to record their count as well but that's for later)
  2. How to compare a set of 3 strings with another, irrespective of their order, obviously depending on the structure used? I want to know a proper solution to this rather than doing everything naively!

I am using C# by the way.

Upvotes: 1

Views: 813

Answers (4)

Igor Bendrup
Igor Bendrup

Reputation: 2837

You can inherit from List<String> and override Equals() and GetHashCode() methods:

public class StringList : List<String>
{
    public override bool Equals(object obj)
    {
        StringList other = obj as StringList;
        if (other == null) return false;
        return this.All(x => other.Contains(x));
    }
    public override int GetHashCode()
    {
        unchecked
        {
            int hash = 19;
            foreach (String s in this)
            {
                hash = hash + s.GetHashCode() * 31;
            }
            return hash;
        }
    }
}

Now, you can use HashSet<StringList> to store only unique sets

Upvotes: 0

maxkoryukov
maxkoryukov

Reputation: 4556

here is a simple string-wrapper for you:

/// The wrapper for three strings
public class StringTriplet
{

    private List<string> Store;

    // accessors to three source strings:
    public string A { get; private set; }
    public string B { get; private set; }
    public string C { get; private set; }

    // constructor (need to feel internal storage)
    public StringTriplet(string a, string b, string c)
    {
        this.Store = new List<string>();
        this.Store.Add(a);
        this.Store.Add(b);
        this.Store.Add(c);
        // sort is reqiured, cause later we don't want to compare all strings each other
        this.Store.Sort();
        this.A = a;
        this.B = b;
        this.C = c;
    }


    // additional method. you could add IComparable declaration to the entire class, but it is not necessary in your task...
    public int CompareTo(StringTriplet obj)
    {
        if (null == obj)
            return -1;

        int cmp;
        cmp = this.Store.Count.CompareTo(obj.Store.Count);
        if (0 != cmp)
            return cmp;

        for (int i = 0; i < this.Store.Count; i++)
        {
            if (null == this.Store[i])
                return 1;

            cmp = this.Store[i].CompareTo(obj.Store[i]);
            if ( 0 != cmp )
                return cmp;
        }

        return 0;
    }

    // additional method. it is a good practice : override both 'Equals' and 'GetHashCode'. See below..
    override public bool Equals(object obj)
    {
        if (! (obj is StringTriplet))
            return false;
        var t = obj as StringTriplet;
        return ( 0 == this.CompareTo(t));
    }

    // necessary method . it will be implicitly used on adding values to the HashSet
    public override int GetHashCode()
    {
        int res = 0;
        for (int i = 0; i < this.Store.Count; i++)
            res = res ^ (null == this.Store[i] ? 0 : this.Store[i].GetHashCode()) ^ i;

        return res;
    }
}

Now you could just create hashset and add values:

var t = new HashSet<StringTriplet> ();

t.Add (new StringTriplet ("a", "b", "c"));
t.Add (new StringTriplet ("a", "b1", "c"));
t.Add (new StringTriplet ("a", "b", "c"));  // dup
t.Add (new StringTriplet ("a", "c", "b"));  // dup
t.Add (new StringTriplet ("1", "2", "3"));
t.Add (new StringTriplet ("1", "2", "4"));
t.Add (new StringTriplet ("3", "2", "1"));

foreach (var s in t) {
    Console.WriteLine (s.A + " " + s.B + " " + s.C);
}
return 0;

Upvotes: 0

DrewJordan
DrewJordan

Reputation: 5314

Probably the best way is to use a HashSet, if you don't need to have duplicate elements in your sets. It sounds like each set of 3 has 3 unique elements; if that is actually the case, I would combine a HashSet approach with the concatenation that you already worked out, i.e. order the elements, combine with some separator, and then add the concatenated elements to a HashSet which will prevent duplicates from ever occuring in the first place.

If your sets of three could have duplicate elements, then Kevin's approach is what you're going to have to do for each. You might get some better performance from using a list of HashSets for each set of three, but with only three elements the overhead of creating a hash for each element of potentially millions of sets seems like it would perform worse then just iterating over them once.

Upvotes: 1

Kevin
Kevin

Reputation: 551

  1. Either an array or a list is your best bet for storing the data, since as wentimo mentioned in a comment, concatenating them means that you are losing data that you may need. To steal his example, "ab" "cd "ef" concatenated together is the same as "abcd" "e" and "f" concatenated, but shouldn't be treated as equivalent sets.

  2. To compare them, I would order the list alphabetically, then compare each value in order. That takes care of the fact that the order of the values doesn't matter. A pseudocode example might look like this:

    Compare(List<string> a, List<string> b)
    {
        a.Sort();
        b.Sort();
        if(a.Length == b.Length)
        {
            for(int i = 0; i < a.Length; i++)
            {
                if(a[i] != b[i])
                {
                    return false;
                }
            }
            return true;
        }
        else
        {
            return false;
        }
    }
    

Update

Now that you stated in a comment that performance is an imporatant consideration since you may have millions of these sets to compare and that you won't have duplicate elements in a set, here is a more optimized version of my code, note that I no longer have to sort the two lists, which will save quite a bit of time in executing this function.

Compare(List<string> a, List<string> b)
{
    if(a.Length == b.Length)
    {
        for(int i = 0; i < a.Length; i++)
        {
            if(!b.Contains(a[i]))
            {
                return false;
            }
        }
        return true;
    }
    else
    {
        return false;
    }
}

DrewJordan's approach of using a hashtable is still probably than my approach, since it just has to sort each set of three and then can do the comparison to your existing sets much faster than my approach can.

Upvotes: 3

Related Questions