Reputation: 589

Finding common characters in two strings

I am coding for the problem in which we got to count the number of common characters in two strings. Main part of the count goes like this

for(i=0; i < strlen(s1); i++) {
    for(j = 0; j < strlen(s2); j++) {
        if(s1[i] == s2[j]) {
            count++;
            s2[j] = '*';
            break;
        }
    }
}

This goes with an O(n^2) logic. However I could not think of a better solution than this. Can anyone help me in coding with an O(n) logic.

Upvotes: 6

Answers (13)

Antara Kundu

Reputation: 1

Python Code:

    >>>s1='abbc'    
    >>>s2='abde'
    >>>p=list(set(s1).intersection(set(s2)))
    >>print(p)
    ['a','b']

Hope this helps you, Happy Coding!

Upvotes: 0

Denys

Reputation: 1478

No need to initialize and keep an array of 26 elements (numbers for each letter in alphabet). Just fo the following:

Using HashMap store letter as a key and integer got the count as a value.
Create a Set of characters.
Iterate through each string characters, add to the Set from step 2. If add() method returned false, (means that same character already exists in the Set), then add the character to the map and increment the value.

These steps are written considering Java programming language.

Upvotes: 0

ARUPJYOTI DUTTA

Reputation: 1

Their is a more better version in c++ : C++ bitset and its application A bitset is an array of bool but each Boolean value is not stored separately instead bitset optimizes the space such that each bool takes 1 bit space only, so space taken by bitset bs is less than that of bool bs[N] and vector bs(N). However, a limitation of bitset is, N must be known at compile time, i.e., a constant (this limitation is not there with vector and dynamic array)

As bitset stores the same information in compressed manner the operation on bitset are faster than that of array and vector. We can access each bit of bitset individually with help of array indexing operator [] that is bs[3] shows bit at index 3 of bitset bs just like a simple array. Remember bitset starts its indexing backward that is for 10110, 0 are at 0th and 3rd indices whereas 1 are at 1st 2nd and 4th indices. We can construct a bitset using integer number as well as binary string via constructors which is shown in below code. The size of bitset is fixed at compile time that is, it can’t be changed at runtime. For more information about bitset visit the site : https://www.geeksforgeeks.org/c-bitset-and-its-application

The code is as follows :

    // considering the strings to be of lower case.

    int main()
    {
        string s1,s2;
        cin>>s1>>s2;

        //Declaration for bitset type variables
        bitset<26> b_s1,b_s2;

        // setting the bits in b_s1 for the encountered characters of string s1 
        for(auto& i : s1)
        {
          if(!b_s1[i-'a'])
            b_s1[i-'a'] = 1;
        }

        // setting the bits in b_s2 for the encountered characters of string s2 
        for(auto& i : s2)
        {
          if(!b_s2[i-'a'])
            b_s2[i-'a'] = 1;
        }

       // counting the number of set bits by the "Logical AND"  operation
       // between b_s1 and b_s2
       cout<<(b_s1&b_s2).count(); 

     }

Upvotes: 0

Visiedo

Reputation: 387

First, your code does not run in O(n^2), it runs in O(nm), where n and m are the length of each string.

You can do it in O(n+m), but not better, since you have to go through each string, at least once, to see if a character is in both.

An example in C++, assuming:

ASCII characters
All characters included (letters, numbers, special, spaces, etc...)
Case sensitive



std::vector<char> strIntersect(std::string const&s1, std::string const&s2){

    std::vector<bool> presents(256, false);  //Assuming ASCII
    std::vector<char> intersection;

    for (auto c : s1) {
        presents[c] = true;
    }
    for (auto c : s2) {
        if (presents[c]){
            intersection.push_back(c);
            presents[c] = false;
        }
    }
    return intersection;
}

int main() {
    std::vector<char> result; 
    std::string s1 = "El perro de San Roque no tiene rabo, porque Ramon Rodriguez se lo ha cortado";
    std::string s2 = "Saint Roque's dog has no tail, because Ramon Rodriguez chopped it off";

    //Expected: "S a i n t   R o q u e s d g h l , b c m r z p"

    result = strIntersect(s1, s2);
    for (auto c : result) {
         std::cout << c << " ";
    }
    std::cout << std::endl;

    return 0;
}

Upvotes: 0

Aman

Reputation: 8995

It can be done in O(n) time with constant space.

The pseudo code goes like this :

int map1[26], map2[26];
int common_chars = 0;

for c1 in string1:
    map1[c1]++;

for c2 in string2:
    map2[c2]++;

for i in 1 to 26:
    common_chars += min(map1[i], map2[i]);

Upvotes: 7

Raj Katta

Reputation: 11

int count(string a, string b) 
{

   int i,c[26]={0},c1[26]={};

   for(i=0;i<a.length();i++)
   {
        if(97<=a[i]&&a[i]<=123)
        c[a[i]-97]++;
   }
   for(i=0;i<b.length();i++)
   {
       if(97<=b[i]&&b[i]<=123)
        c1[b[i]-97]++;
    } 
    int s=0;
    for(i=0;i<26;i++)
    {
        s=s+abs(c[i]+c1[i]-(c[i]-c1[i]));

    }   

    return (s);  
}

This is much easier and better solution

Upvotes: 1

Shridhar.S

Reputation: 112

Following code traverses each sting only once. So the complexity is O(n). One of the assumptions is that the upper and lower cases are considered same.

 #include<stdio.h>

int main() {
    char a[] = "Hello world";
    char b[] = "woowrd";
    int x[26] = {0};
    int i;
    int index;

    for (i = 0; a[i] != '\0'; i++) {
        index = a[i] - 'a';
        if (index > 26) {
            //capital char
            index = a[i] - 'A';
        }
        x[index]++;
    }

    for (i = 0; b[i] != '\0'; i++) {
        index = b[i] - 'a';
        if (index > 26) {
            //capital char
            index = b[i] - 'A';
        }

        if (x[index] > 0)
            x[index] = -1;
    }

    printf("Common characters in '%s' and '%s' are ", a, b);
    for (i = 0; i < 26; i++) {
        if (x[i] < 0)
            printf("%c", 'a'+i);
    }
    printf("\n");
}

Upvotes: 1

Amulya Gaur

Reputation: 1

can be easily done using the concept of "catching" which is a sub-algorithm of hashing.

Upvotes: -1

B Jacob

Reputation: 389

C implementation to run in O(n) time and constant space.

#define ALPHABETS_COUNT 26 
int commonChars(char *s1, char *s2)
{
    int c_count = 0, i; 
    int arr1[ALPHABETS_COUNT] = {0}, arr2[ALPHABETS_COUNT] = {0};

    /* Compute the number of occurances of each character */
    while (*s1) arr1[*s1++-'a'] += 1;
    while (*s2) arr2[*s2++-'a'] += 1;       

    /* Increment count based on match found */
    for(i=0; i<ALPHABETS_COUNT; i++) {
        if(arr1[i] == arr2[i]) c_count += arr1[i];
        else if(arr1[i]>arr2[i] && arr2[i] != 0) c_count += arr2[i];
        else if(arr2[i]>arr1[i] && arr1[i] != 0) c_count += arr1[i];
    }

    return c_count;

}

Upvotes: 0

Paul Hankin

Reputation: 58261

Your current code is O(n^3) because of the O(n) strlens and produces incorrect results, for example on "aa", "aa" (which your code will return 4).

This code counts letters in common (each letter being counted at most once) in O(n).

int common(const char *a, const char *b) {
    int table[256] = {0};
    int result = 0;
    for (; *a; a++)table[*a]++;
    for (; *b; b++)result += (table[*b]-- > 0);
    return result;
}

Depending on how you define "letters in common", you may have different logic. Here's some testcases for the definition I'm using (which is size of the multiset intersection).

int main(int argc, char *argv[]) {
    struct { const char *a, *b; int want; } cases[] = {
        {"a", "a", 1},
        {"a", "b", 0},
        {"a", "aa", 1},
        {"aa", "a", 1},
        {"ccc", "cccc", 3},
        {"aaa", "aaa", 3},
        {"abc", "cba", 3},
        {"aasa", "asad", 3},
    };
    int fail = 0;
    for (int i = 0; i < sizeof(cases) / sizeof(*cases); i++) {
        int got = common(cases[i].a, cases[i].b);
        if (got != cases[i].want) {
            fail = 1;
            printf("common(%s, %s) = %d, want %d\n",
                   cases[i].a, cases[i].b, got, cases[i].want);
        }
    }
    return fail;
}

Upvotes: 3

Mustafa Chelik

Reputation: 2184

You can do it with 2n:

int i,j, len1 = strlen(s1), len2 = strlen(s2);
unsigned char allChars[256] = { 0 };
int count = 0;

for( i=0; i<len1; i++ )
{
    allChars[ (unsigned char) s1[i] ] = 1;
}

for( i=0; i<len2; i++ )
{
    if( allChars[ (unsigned char) s1[i] ] == 1 )
    {
        allChars[ (unsigned char) s2[i] ] = 2;
    }
}

for( i=0; i<256; i++ )
{
    if( allChars[i] == 2 )
    {
        cout << allChars[i] << endl;
        count++;
    }
}

Upvotes: 1

Zuko

Reputation: 2914

for (std::vector<char>::iterator i = s1.begin(); i != s1.end(); ++i)
{
    if (std::find(s2.begin(), s2.end(), *i) != s2.end())
   {
    dest.push_back(*i);
   }
}

taken from here

Upvotes: 0

haccks

Reputation: 106012

This is very simple. Take two int arrays freq1 and freq2. Initialize all its elements to 0. Then read your strings and store the frequencies of the characters to these arrays. After that compare the arrays freq1 and freq2 to find the common characters.

Upvotes: 11

Finding common characters in two strings

Answers (13)

Related Questions