Reputation: 5476
I have been trying to implement an efficient string comparing algorithm that will given points according to character changes.
For example:
String #1: abcd
String #2: acdb
Initial Point: 0
In here String #2 character c
changed it's index from 2 to 1, and d changed its index from 4 to 3. Both of them (2-1=1
and 4-3=1
) adds up to 2 points to initial point. Not a homework or anything, I just didn't want to create a basic for loop comparing each character one by one and wanted to ask if anything efficient method (like hashing etc.) can be applied?
Upvotes: 0
Views: 2376
Reputation: 40669
It sounds like what you really want is something like Levenshtein distance (but not exactly that). Here's a first cut.
What it does is walk the game tree of all possible rearrangements of a to see if they match b. It associates a cost with each rearrangement, expressed as a declining budget.
The outer loop walks first with a budget of 0, so only an exact match is allowed.
If it got no success, then it walks with a budget of 1, finding all matches containing only one rearrangement.
If it got no success, then it walks with a budget of 2, and so on.
As it matches, it keeps an array of integers delta, telling how far each element of a has been swapped. Whenever it gets a success, it somwhow prints out that delta array, and that is your record of what swaps were done to get that match.
void walk(char* a, char* b, int* delta, int budget, int& nSuccess){
delta[0] = 0;
if (budget < 0) return;
if (a[0] == '\0' && b[0] == '\0'){ // end of both strings
nSuccess++;
// print out the deltas
return;
}
if (a[0] == '\0') return; // excess chars in b
if (b[0] == '\0') return; // excess chars in a
if (a[0] == b[0]){ // first chars are equal, move to next
walk(a+1, b+1, delta+1, budget, nSuccess);
return;
}
for (int i = 1; a[i] != '\0'; i++){
delta[0] = i;
swap(a[0], a[i]);
if (a[0] == b[0]){
walk(a+1, b+1, delta+1, budget-1, nSuccess);
}
swap(a[0], a[i]);
delta[0] = 0;
}
}
void top(char* a, char* b){
int nSuccess = 0;
int delta[512];
for (int budget = 0; nSuccess==0; budget++){
walk(a, b, budget, delta, nSuccess);
}
}
The performance of the algorithm is exponential in N, where N is the minimum number of rearrangements needed to make the strings match. So it probably should not be used until after you've verified that each strings has the same number of each character, and only use it if you need to see that record of rearrangements.
Upvotes: 1
Reputation: 126787
You are overcomplicating a simple thing. You can't get more efficient than comparing each character and stopping the comparison at the first character you find different - which is basically what strcmp
does. The only typical optimization you can do is, if you already know the length of the two strings (as happens when you use std::string
or other counted strings), to determine them unequal immediately if the two length differ.
Upvotes: 3