Reputation: 137
I have a small problem. I'm solving one programming task, but have a problem with it. It is simple one, but time limit make it a bit harder.
Find number of occurrences of substring. You will be given M - length of substring; substring to find, N - length of base string; base string.
M <= 100 000
N<= 200 000Input
10
budsvabbud
79
uaahskuskamikrofonubudsvabbudnebudlabutkspkspkspmusimriesitbudsvabbudsvabbudnelOutput
3
I tried to use using build-in function find,but it wasn't fast enough:
#include<iostream>
#include<string>
using namespace std;
int main()
{
int n;
int occurrences = 0;
string::size_type start = 0;
string base_string, to_find;
cin >> n >> to_find >> n >> base_string;
while ((start = base_string.find(to_find, start)) != string::npos) {
++occurrences;
start++;; // see the note
}
cout << occurrences << endl;
}
So I tried to write my own function, but it was even slower:
#include<iostream>
#include<cstdio>
#include<string>
#include<queue>
using namespace std;
int main()
{
int n, m;
string to_find;
queue<int> rada;
int occurrences = 0;
cin >> m >> to_find >> n;
for (int i = 0; i < n; i++)
{
char c;
scanf(" %c", &c);
int max = rada.size();
for (int j = 0; j < max; j++)
{
int index = rada.front();
rada.pop();
if (c == to_find[index])
{
if (++index == m) {
occurrences++;
}
else
rada.push(index);
}
}
if (c == to_find[0])
{
if (1 == m)
n++;
else
rada.push(1);
}
}
cout << occurrences << endl;
}
I know some people did this in 0 ms, but my first code needs more than 2000 ms and the second one a lot more than that. Have you any ideas how to solve this? Thanks.
EDIT: Limits of length:
M <= 100 000 - length of substring
N<= 200 000 - lenght of base string
Upvotes: 0
Views: 237
Reputation: 1769
The algorithm you presented is an O(M*N), where N is the length of the text and M is the length of the searched world. Usually, also the libraries implement the naive algorithm. However, there is an algorithm by Knuth, Morrison and Pratt, which does it in a O(M+N) time. See, e.g., Wikipedia Knuth-Morrison-Pratt Algorithm. It has some variations which might be easier to implement like Boyer-Moore-Horsepool.
Upvotes: 2
Reputation: 1035
static size_t findOccurences(const char * const aInput, const char * const aDelim)
{
if (aInput == 0x0 || aDelim == 0x0)
{
throw std::runtime_error("Argument(s) null");
}
const size_t inputLength = strlen(aInput);
const size_t delimLength = strlen(aDelim);
size_t result = 0;
if (delimLength <= inputLength && delimLength > 0)
{
size_t delimIndex = 0;
for (size_t inputIndex = 0; inputIndex < inputLength; inputIndex++)
{
if (aInput[inputIndex] != aDelim[delimIndex])
{
delimIndex = 0;
}
else
{
delimIndex++;
if (delimIndex == delimLength)
{
delimIndex = 0;
result++;
}
}
}
}
return result;
}
static size_t unsafeFindOccurences(const char * const aInput, const char * const aDelim)
{
const size_t inputLength = strlen(aInput);
const size_t delimLength = strlen(aDelim);
size_t result = 0;
size_t delimIndex = 0;
for (size_t inputIndex = 0; inputIndex < inputLength; inputIndex++)
{
if (aInput[inputIndex] != aDelim[delimIndex])
{
delimIndex = 0;
}
else
{
delimIndex++;
if (delimIndex == delimLength)
{
delimIndex = 0;
result++;
}
}
}
return result;
}
x86 x64
Debug 5501ms 5813ms
Release 3889ms 3998ms
x86 x64
Debug 5442ms 5564ms
Release 3074ms 3139ms
Compiled with Visual Studio 2015, Visual Studio 2015 (v140) toolset under Windows 10 x64 Pro.
Using this input. Searching for 'ad' and 1.000.000 iterations.
Upvotes: 1
Reputation: 20311
I try this code in debug mode without any optimization and it took 11 mSec. VS.NET 2013 , Intel Core i7:
int main()
{
int n;
int occurrences = 0;
string::size_type start = 0;
string base_string, to_find;
base_string.reserve(200000);
to_find.reserve(100000);
for (size_t i = 0; i < 100000; i++){
base_string.push_back('a');
}
for (size_t i = 0; i < 100000; i++){
base_string.push_back('b');
}
for (size_t i = 0; i < 100000; i++){
to_find.push_back('b');
}
auto start_s = clock();
while ((start = base_string.find(to_find, start)) != string::npos) {
++occurrences;
start++;; // see the note
}
auto stop_s = clock();
std::cout << (stop_s - start_s) / double(CLOCKS_PER_SEC) * 1000;
cout << occurrences << endl;
std::getchar();
}
There is a problem in compiler, configuration, your machine, but in your code.
Upvotes: 0