Reputation: 6871
This is a code written in C++ using standard libraries to find the String Similarity of a string S with each of it's suffixes.
Though, it gives the correct output, it takes a lot of time in doing so for large strings. Here's the code:
#include <iostream>
#include <string>
using namespace std;
int sim(string a, string b){
int count=0;
int sa=a.size();
int sb=b.size();
int iter;
if(sa>sb) iter=sb;
else iter=sa;
for(int i=0; i<iter; i++){
if (a[i]!=b[i]) break;
else count++;
}
return count;
}
int strsim(string a){
int sum=0;
int s=a.size();
for(int i=0; i<s; i++){
sum=sum+sim(a,a.substr(i));
}
return sum;
}
int main(){
int n;
cin >> n;
string a[n];
for(int i=0; i<n; i++){
cin >> a[i];
}
for(int i=0; i<n; i++){
cout << strsim(a[i]) << '\n';
}
}
Constraints: The length of each string is at most 100000 and contains only lower case characters and the number of testcases, 'n' cannot exceed 10.
Sample I/O:
Input:
1 ababaa
Output:
11
i.e, 6 + 0 + 3 + 0 + 1 + 1 = 11
Upvotes: 6
Views: 382
Reputation: 1169
An addition of one more line to your strsim() function reduce some additional functional calls to your sim() function. As we know that cost of a function call(consumes extra memory and processing for "prologue" and "epilogue" mechanism) is quite considerable when we are talking about performance.
So call sim function only if the first character of the two strings in your case "a" and "a.substring" are equal.
int strsim(string a){
int sum=0;
int s=a.size();
for(int i=0; i<s; i++){
if(a[i] == a[0]) //add this extra line
sum=sum+sim(a,a.substr(i));
}
return sum;
}
Upvotes: -1
Reputation: 10667
First you should check if there isn't a better algorithm. Then, depending on how much you are ready to invest and loose portability you may want to manually vectorize you code, if you compiler didn't manage to do it. Using gcc intrinsics (see eg This tutorial) I've been able to get a x6 speedup on a similar code.
Upvotes: 0
Reputation: 2934
Here is as efficient as you can make it (without going into FFT territory):
sum_i=0^j sum_j=0^s f_i,j
int strsim(const string &a){
int s=a.size();
int sum=0;
for(int i=0; i<s; i++){
for (int j=i;j<s;++j){
if (a[j-i]!=a[i]) break;
sum ++;
}
}
return sum;
}
I know it seems different than your code, but (unless I have a bug) it should return the same thing.
Try it and let me know!
Upvotes: 0
Reputation: 3881
Your current code computes for a single string for length L, in O(L^3)
(substr takes linear running time). Not to mention high constant costs to the above complexity due to inefficient passing around of strings.
Your algorithm can be simply reduced to finding longest common prefixes of a string with all its suffixes. This can be easily done using Suffix Aray. The concept cannot be explained as an answer, so I will highly recommend you to read this.
A sub optimal and simple to code Suffix Array solution will have O(Llg^2(L))
(L = string length) construction time and O(1)
time to query Longest Common Prefix of 2 suffixes using Range Minimum Query. Note that the whole string itself is its own suffix. In your case, you need to make L queries for each string. So total complexity for one string will be O(Llg^2(L)) + O(L)
.
If you want to improve further, You may reduce the construction time to O(Llg(L))
by using radix sort, or to O(L)
(Read)
Upvotes: 7
Reputation: 24269
The biggest cost you have here is that you are passing the strings by value - that means that each call to "sim" creates two brand new strings and copies over the data to them. You should eliminate this and replace it with pass-by-reference.
#include <iostream>
#include <string>
#include <vector>
using namespace std;
size_t compareSubstring(const std::string& str, size_t offset)
{
size_t count = 0;
std::string::const_iterator lhsIt = str.begin();
std::string::const_iterator rhsIt = str.begin() + offset;
for ( ; rhsIt != str.end() ; ++lhsIt, ++rhsIt ) {
if (*lhsIt != *rhsIt)
break;
++count;
}
return count;
}
int compareString(const string& str)
{
size_t count = 0;
const size_t length = str.size();
for(size_t i = 0; i < length; ++i) {
count += compareSubstring(str, i);
}
return count;
}
int main()
{
size_t numStrings = 0;
std::cin >> numStrings;
std::vector<std::string> strings;
strings.resize(numStrings);
for(size_t i = 0; i < numStrings; ++i) {
std::cin >> strings[i];
}
for(size_t i = 0; i < numStrings; ++i) {
std::cout << compareString(strings[i]) << std::endl;
}
}
Upvotes: 3