Majid Azarniush
Majid Azarniush

Reputation: 643

How can compare two unsorted string in c#?

I have 2 string for example "abcdefg" "bcagfed" I want to write a method that determine these two strings are equal

*sorted or unsorted is not matter. In my example my above strings are equal.

One answer is: before checking equality, sort them and after that easily can check equality but this is not the fastest way

Upvotes: 1

Views: 424

Answers (6)

Caius Jard
Caius Jard

Reputation: 74730

One answer is: before checking equality, sort them and after that easily can check equality but this is not the fastest way

That might not actually be true. The following is the current fastest approach here, about 20% faster than the other answer I posted and about twice as fast as IdleMind's approach (next quickest):

static bool SameChars(string str1, string str2){
    if(str1.Length != str2.Length) return false;

    var c1 = str1.ToCharArray();
    var c2 = str2.ToCharArray();
    
    Array.Sort(c1);
    Array.Sort(c2);

    for(int i = c1.Length - 1; i >= 0; i--)
      if(c1[i] != c2[i]) return false;
      
    return true;
}

..and it's simpler to understand than my other answer

Upvotes: 2

pm100
pm100

Reputation: 50210

I did some timing comparisons, just for fun

using

a) NazaRN

  var areEqual = Enumerable.SequenceEqual(str1.OrderBy(c => c), str2.OrderBy(c => c));

b) IdleMind

bool stringsAreEqual = (str1.Length == str2.Length);
if (stringsAreEqual) {
    // see if each char from str1 is in str2, removing from str2 as we go
    List<char> str2chars = new List<char>(str2.ToCharArray());
    foreach (char c in str1.ToCharArray()) {
        int index = str2chars.IndexOf(c);
        if (index != -1) {
            str2chars.RemoveAt(index);
        }
        else {
            stringsAreEqual = false;
            break;
        }
    }
    // any chars left over in str2?
    stringsAreEqual = stringsAreEqual && (str2chars.Count == 0);
}

c) CaiusJard

static bool SameChars(string str1, string str2){

    if(str1.Length != str2.Length) return false;

    var counts = new int[26];
    var numPositive = 0;

    foreach(var c in str1){
      var r = ++counts[c - 'a'];
      if(r == 1) numPositive++;
    }

    foreach(var c in str2){
      var r = --counts[c - 'a'];
      if(r < 0) return false;
      if(r == 0) numPositive--;
    }

    return numPositive == 0;
}

d) MatthewWatson

var lookup1 = s1.ToLookup(ch => ch); // O(N)
var lookup2 = s2.ToLookup(ch => ch); // O(N)

return lookup1.All(keyValue => // O(N)
    lookup2.Contains(keyValue.Key) && keyValue.Count() == lookup2[keyValue.Key].Count());

Doing 1 million iterations (relative timings: smaller numbers means faster, primitive method here: https://dotnetfiddle.net/WjcPO3 )

  • a = 75
  • b = 16
  • c = 10
  • d = 100

We have a clear winner, but I would still go with a) for reasons @Heinzi stated in the comments:

"One answer is: before checking equality, sort them and after that easily can check equality but this is not the fastest way" - Indeed, but except in the rarest of circumstances, the speed difference simply won't matter. CPU time is cheap, developer time is expensive, so go for the simplest and most maintainable solution, not for the fastest one.

Upvotes: 1

Caius Jard
Caius Jard

Reputation: 74730

Here's an option that uses the curio of being able to treat chars as ints. It relies on the characters in the strings being a-z (but see the footnote if you want a wider range of chars). It's a simplistic form of dictionary where the character itself is used as an indexer, but it doesn't need any advanced technology at all - it purely relies on arrays and ints (and if you class foreach/enumerations of chars in a string as advanced they could be swapped for a normal for):

static bool SameChars(string str1, string str2){

    if(str1.Length != str2.Length) return false;

    var counts = new int[26];
    var numPositive = 0;

    foreach(var c in str1){
      var r = ++counts[c - 'a'];
      if(r == 1) numPositive++;
    }

    foreach(var c in str2){
      var r = --counts[c - 'a'];
      if(r < 0) return false;
      if(r == 0) numPositive--;
    }

    return numPositive == 0;
}

If the string lengths differ it's an easy win. Otherwise we'll use an array of 26 ints and index it by the character we're looking at, to keep a count of the number of times we see each character. Suppose we're iterating and we encounter our first letter a - the code ++counts[c - 'a'] will do 'a' - 'a' (which is 0) and increment array index [0] from 0 to 1.

The resulting counter, 1 is captured into r. We examine r and see if it's 1; the only time it will reasonably be 1 is the first time we see a given letter, so we increment numPositive which is a counter of the number of array indexes that are holding a positive number

We then loop over the second string, using the chars in it to decrement the counts. Again we capture the number now stored in that array position, and here we can make a small optimization; if any array index goes negative, then str2 must contain more of that character than str1 does, so early return false

If the resulting counter for a given letter has hit 0, decrement 1 from numPositive, the variable that is tracking the number of array positions that are holding positive numbers.

At the end of the operation we want to know if any of the counts elements are still holding a positive number; rather than checking each one, we can just look to see if numPositive is 0

Footnote: If you wanted to support a wider character range you'd need a bigger array (e.g. if you want to support A-Z and a-z you'll need an array that is 'z'-'A' long, and you'll need to do - 'A' in your math

Edit:

Swapping the foreach out for for makes things about 10% quicker:

static bool SameCharsF(string str1, string str2){
    if(str1.Length != str2.Length) return false;

    var counts = new int[26];
    var numPositive = 0;

    for(int i = 0; i < str1.Length; i++){
      var r = ++counts[str1[i] - 'a'];
      if(r == 1) numPositive++;
    }

    for(int i = 0; i < str2.Length; i++){
      var r = --counts[str1[i] - 'a'];
      if(r < 0) return false;
      if(r == 0) numPositive--;
    }

    return numPositive == 0;
}

Here are a couple of variations that use Dictionary; they can cope with any number of any kind of char without special treatment but they're 2-3x slower. Surprisingly, the four loop version that preloads the dictionary so ContainsKey can be omitted isn't significantly slower than the ContainsKey version:

static bool SameCharsDP(string str1, string str2){
    if(str1.Length != str2.Length) return false;

    var d = new Dictionary<char, int>();
    var numPositive = 0;
    
    foreach(var c in str1)
      d[c] = 0;
    
    foreach(var c in str2)
      d[c] = 0;
    
    foreach(var c in str1){
      var r = ++d[c];
      if(r == 1) numPositive++;
    }
    
    foreach(var c in str2){
      var r = --d[c];
      if(r < 0) return false;
      if(r == 0) numPositive--;
    }

    return numPositive == 0;
}

static bool SameCharsDCK(string str1, string str2){
    if(str1.Length != str2.Length) return false;

    var d = new Dictionary<char, int>();
    var numPositive = 0;
    
    foreach(var c in str1){
        if(!d.ContainsKey(c))
            d[c]=0;
        var r = ++d[c];
        if(r == 1) numPositive++;
    }
    
    foreach(var c in str2){
        if(!d.ContainsKey(c))
            d[c]=0;
        var r = --d[c];
        if(r < 0) return false;
        if(r == 0) numPositive--;
    }

    return numPositive == 0;
}

Upvotes: 0

Matthew Watson
Matthew Watson

Reputation: 109852

As others have pointed out, the easiest way is to sort then compare.

For academic interest, here's a way of doing it without sorting:

public static bool StringsContainSameChars(string s1, string s2)
{
    var lookup1 = s1.ToLookup(ch => ch); // O(N)
    var lookup2 = s2.ToLookup(ch => ch); // O(N)

    return lookup1.All(keyValue => // O(N)
        lookup2.Contains(keyValue.Key) && keyValue.Count() == lookup2[keyValue.Key].Count());
}

In theory this is an O(N) operation, but it would have a huge overhead and is likely to be slower than the obvious sorting approach. If N was very large, then this would in theory be faster.

Note: I'm saying this is O(N) because:

  • Enumerable.ToLookup() is an O(N) operation
  • Lookup<TKey,TElement>.Contains() is an O(1) operation.
  • The Lookup<TKey,TElement> indexer ([]) is an O(1) operation.
  • keyValue.Count() will be an O(1) operation even though it's using Linq.Count(), because the underlying type implements ICollection<T> which allows Linq to perform the shortcut of using the .Count property.

Therefore the overall complexity is O(N).

But like I say, this is really of academic interest only.

Upvotes: 0

NazaRN
NazaRN

Reputation: 419

I prefer using SequenceEqual, it checks whether the two source sequences are of equal length and their corresponding elements are equal.

var str1 = "abcdefg"; 
var str2 = "bcagfed";

var areEqual = Enumerable.SequenceEqual(str1.OrderBy(c => c), str2.OrderBy(c => c));

Upvotes: 7

Idle_Mind
Idle_Mind

Reputation: 39152

Here's one way to do it without sorting first, and without using anything fancy like LINQ:

string str1 = "abcdefg";
string str2 = "bcagfed";

bool stringsAreEqual = (str1.Length == str2.Length);
if (stringsAreEqual)
{
    // see if each char from str1 is in str2, removing from str2 as we go
    List<char> str2chars = new List<char>(str2.ToCharArray());
    foreach (char c in str1.ToCharArray())
    {
        int index = str2chars.IndexOf(c);
        if (index != -1)
        {
            str2chars.RemoveAt(index);
        }
        else
        {
            stringsAreEqual = false;
            break;
        }
    }
    // any chars left over in str2?
    stringsAreEqual = stringsAreEqual && (str2chars.Count == 0);
}            

Console.WriteLine(str1);
Console.WriteLine(str2);
Console.WriteLine(stringsAreEqual);

Upvotes: 4

Related Questions