Reputation: 26335
I have two strings, "hello" an "eo" for instance, and I wish to find duplicate characaters between the two strings, that is: 'e' and 'o' in this example.
My algorithm would go this way
void find_duplicate(char* str_1, char* str_2, int len1, int len2)
{
char c ;
if(len1 < len2)
{
int* idx_1 = new int[len1]; // record elements in little string
// that are matched in big string
for(int k = 0 ; k < len1 ; k++)
idx_1[k] = 0;
int* idx_2 = new int[len2]; // record if element in str_2 has been
// matched already or not
for(int k = 0 ; k < len2 ; k++)
idx_2[k] = 0;
for(int i = 0 ; i < len2 ; i++)
{
c = str_1[i];
for(int j = 0 ; j < len1 ; j++)
{
if(str_2[j] == c)
{
if(idx_2[j] == 0) // this element in str_2 has not been matched yet
{
idx_1[i] = j + 1; // mark ith element in idx as matched in string 2 at pos j
idx_2[j] = 1;
}
}
}
}
// now idx_1 and idx_2 contain matches info, let's remove matches.
char* str_1_new = new char[len1];
char* str_2_new = new char[len2];
int kn = 0;
for(int k = 0 ; k < len1 ; k++)
{
if(idx_1[k] > 0)
{
str_1_new[kn] = str_1[k];
kn++;
}
}
kn = 0;
for(int k = 0 ; k < len2 ; k++)
{
if(idx_2[k] > 0)
{
str_2_new[kn] = str_2[k];
kn++;
}
}
}
else
{
// same here, switching roles (do it yourself)
}
}
i feel my solution is awkward: - symetry of both cases in first if/else and code duplication - time complexity: 2*len1*len2 operations for finding duplicates, then len1 + len2 operations for removal - space complexity: two len1 and two len2 char*.
What if len1
and len2
are not given (with and without resort to STL vector) ?
could you provide your implementation of this algo ?
thanks
Upvotes: 4
Views: 1823
Reputation: 10917
std::vector<char> duplicates;
for (auto c1: std::string(str_1))
for (auto c2: std::string(str_2))
if (c1 == c2)
duplicates.push_back(c1);
or if you don't have a c++11 compliant compiler.
std::vector<char> duplicates;
std::string s1(str_1);
std::string s2(str_2);
for (std::size_t i = 0; i < s1.size(); i++)
for (std::size_t j = 0; j < s2.size(); j++)
if (s1[i] == s2[j])
duplicates.push_back(s1[i]);
-- based on Zaroth answer
std::vector<int> count(256,0);
for (auto c : std::string(str_1))
count[c] += 1;
for (auto c : std::string(str_2))
if (count[c] > 0)
duplicates.push_back(c);
Upvotes: 0
Reputation: 578
First of all, it isn't substring matching problem - it is problem of finding common characters between two strings.
Your solution works in O(n*m), where n=len1 and m=len2 in your code. You could easily solve the same problem in O(n+m+c) time by counting characters in each of strings (where c is equal to size of character set). This algorithm is called counting sort.
Sample code implementing this in your case:
#include <iostream>
#include <cstring> // for strlen and memset
const int CHARLEN = 256; //number of possible chars
using namespace std;
// returns table of char duplicates
char* find_duplicates(const char* str_1, const char* str_2, const int len1, const int len2)
{
int *count_1 = new int[CHARLEN];
int *count_2 = new int[CHARLEN];
char *duplicates = new char[CHARLEN+1]; // we hold duplicate chars here
int dupl_len = 0; // length of duplicates table, we insert '\0' at the end
memset(count_1,0,sizeof(int)*CHARLEN);
memset(count_2,0,sizeof(int)*CHARLEN);
for (int i=0; i<len1; ++i)
{
++count_1[str_1[i]];
}
for (int i=0; i<len2; ++i)
{
++count_2[str_2[i]];
}
for (int i=0; i<CHARLEN; ++i)
{
if (count_1[i] > 0 && count_2[i] > 0)
{
duplicates[dupl_len] = i;
++dupl_len;
}
}
duplicates[dupl_len]='\0';
delete count_1;
delete count_2;
return duplicates;
}
int main()
{
const char* str_1 = "foobar";
const char* str_2 = "xro";
char* dup = find_duplicates(str_1, str_2, strlen(str_1), strlen(str_2));
cout << "str_1: \"" << str_1 << "\" str_2: \"" << str_2 << "\"\n";
cout << "duplicates: \"" << dup << "\"\n";
delete dup;
return 0;
}
Please note that I am also sorting the output here. If you do not want to do that, you can skip the counting of characters in second string and just start comparing duplicates on the go.
If you, however, intend to be able to detect multiple duplicates of the same letter (e.g. if "banana" and "arena" should output "aan" instead of "an"), then you can just substract the number of counts in current solution and adjust the output accordingly.
Upvotes: 3