Reputation: 27
I wrote code to find a substring in another string using hashing, but it's giving me a wrong result.
A description of how the code works:
n
powers of p=31
in array pows
.s[0..i]
in the array h
.h
array and store it in a set.t
and store its hash.t
and hashes in the set.The hash h[n2-1]
should exist in the set but it does not. Could you help me find the bug in the code?
Note: When I use the modular inverse instead of multiplying by pows[i-8]
the code runs well.
#include <bits/stdc++.h>
#define m 1000000007
#define N (int)2e6 + 3
using namespace std;
long long pows[N], h[N], h2[N];
set<int> ss;
int main() {
string s = "www.cplusplus.com/forum";
// powers array
pows[0] = 1;
int n = s.length(), p = 31;
for (int i = 1; i < n; i++) {
pows[i] = pows[i - 1] * p;
pows[i] %= m;
}
// hash from 0 to i array
h[0] = s[0] - 'a' + 1;
for (int i = 1; i < n; i++) {
h[i] = h[i - 1] + (s[i] - 'a' + 1) * pows[i];
h[i] %= m;
}
// storing each hash with 9 characters in a set
ss.insert(h[8]);
for (int i = 9; i < n; i++) {
int tp = h[i] - h[i - 9] * pows[i - 8];
tp %= m;
tp += m;
tp %= m;
ss.insert(tp);
}
// print hashes with 9 characters
set<int>::iterator itr = ss.begin();
while (itr != ss.end()) {
cout << *(itr++) << " ";
}
cout << endl;
// t is the string that i want to check if it is exist in s
string t = "cplusplus";
int n2 = t.length();
h2[0] = t[0] - 'a' + 1;
for (int i = 1; i < n2; i++) {
h2[i] = h2[i - 1] + (t[i] - 'a' + 1) * pows[i];
h2[i] %= m;
}
// print t hash
cout << h2[n2 - 1] << endl;
return 0;
}
Upvotes: 0
Views: 1360
Reputation: 3204
I can see two problems with your code:
long long
) in an int
variable. This could cause integer overflow and the hash you computed would probably be incorrect.s = {s[0], s[1], ..., s[n-1]}
, the way you're computing the hash is: h = ∑ s[i] * p^i
. In this case, given the prefix hash stored in h
, the hash for a substring s[l..r]
(inclusive) should be (h[r] - h[l - 1]) / p^(r-l+1)
, instead of what you wrote. This is also why using modular inverse (which is required to perform division under modulo) is correct.I think a more common way to compute hashes is the other way around, i.e. h = ∑ s[i] * p^(n-i-1)
. This allows you to compute the substring hash as h[r] - h[l - 1] * p^(r-l+1)
, which does not require computing modular inverses.
Upvotes: 2