lserlohn
lserlohn

Reputation: 6206

fast way for string comparison

I have a simple question but it makes me confused.

I have two strings, and I want to count how many different characters between the two. The strings are sorted, equal length. Do not split the strings.

For example

input:  abc, bcd
output: 2, because a and d are different characters

input:  abce, bccd
output: 4, because a, c, d and e are different.

I know I can do it in O(N^2), but how can I solve it in O(N) for these sorted strings?

Only need the number of different characters, no need to indicate which number.

Upvotes: 1

Views: 782

Answers (4)

Alejandro
Alejandro

Reputation: 3082

After seeing how simple the answer really is, thanks to @Bill Lynch , my solution may be too complex! Anyways, its a simple counting-difference.

#include <iostream>
#include <algorithm>
#include <array>

int main() {
    std::array<int,26> str1 = {};
    std::array<int,26> str2 = {};

    std::string s1("abce");
    std::string s2("bccd");


    for(char c : s1)
        ++str1[c-'a'];
    for(char c : s2)
        ++str2[c-'a'];

    int index = 0;

    std::cout << std::count_if(str1.begin(),str1.end(),[&](int x)
    {
        return x != str2[index++];
    });
}

Its O(n+m), unless I've made a mistake in the analysis.

Upvotes: 1

Bill Lynch
Bill Lynch

Reputation: 81916

I was originally thinking that you needed a fairly complicated algorithm, like Smith-Waterman for example. But the restrictions on your input makes it fairly easy to implement this in O(m + n), where m is the length of the first string, and n is the length of the second string.

We can use a builtin algorithm to calculate the number of characters that are in common, and then we can use that information to produce the number you are looking for:

#include <algorithm>
#include <iostream>
#include <string>

int main() {
    std::string m = "abce";
    std::string n = "bccd";
    std::string result;

    std::set_intersection(
            m.begin(), m.end(),
            n.begin(), n.end(),
            std::back_inserter(result));

    std::cout << m.size() + n.size() - 2 * result.size() << "\n";
}

In this particular case, it outputs 4, as you wanted.

Upvotes: 3

GorvGoyl
GorvGoyl

Reputation: 49180

you can achieve O(n) using dynamic programming. i.e. use an integer d for storing difference.

Algo:
move from lower index to higher index of both array.  
if a[i] not equal b[j]:
           increase d by 2
           move the index of smaller array and check again.
if a[i] is equal to b[j] : 
           decrease d by 1
           move both index
repeat this until reach the end of array

Upvotes: 0

Sergio0694
Sergio0694

Reputation: 4567

O(2n) and O(n) are exactly the same thing, since the "O" indicates the asymptotic behavior for the cost of your method.

Update: I just noticed you meant O(n^2) with your O(N2).

If you need to do that comparison, you'll always have O(n^2) as your cost, since you have to:

1) Loop for every character of your words, and this is O(n)

2) Compare the current character in each word, and you'll have to use a temporary list that contains the characters you have already checked. So, this is another nested O(n).

So, O(n) * O(n) = O(n^2).

Note: you can always ignore a numeric coefficient inside your O expression, as it doesn't matter.

Upvotes: -1

Related Questions