sower
sower

Reputation: 79

Find all anagrams in a string O(n) solution

Here is the problem:

Given a string s and a non-empty string p, find all the start indices of p's anagrams in s.

Input: s: "cbaebabacd" p: "abc"
Output: [0, 6]
Input: s: "abab" p: "ab"
Output: [0, 1, 2]

Here is my solution

vector<int> findAnagrams(string s, string p) {
    vector<int> res, s_map(26,0), p_map(26,0);
    int s_len = s.size();
    int p_len = p.size();
    if (s_len < p_len) return res;
    for (int i = 0; i < p_len; i++) {
        ++s_map[s[i] - 'a'];
        ++p_map[p[i] - 'a'];
    }
    if (s_map == p_map)
        res.push_back(0);
    for (int i = p_len; i < s_len; i++) {
        ++s_map[s[i] - 'a'];
        --s_map[s[i - p_len] - 'a'];
        if (s_map == p_map)
            res.push_back(i - p_len + 1);
    }
    return res;
}

However, I think it is O(n^2) solution because I have to compare vectors s_map and p_map. Does a O(n) solution exist for this problem?

Upvotes: 2

Views: 4874

Answers (4)

Debasish Ghosh
Debasish Ghosh

Reputation: 84

import java.util.*;
public class FindAllAnagramsInAString_438{
    public static void main(String[] args){
        String s="abab";
        String p="ab";
        // String s="cbaebabacd";
        // String p="abc";
        System.out.println(findAnagrams(s,p));
    }
    public static List<Integer> findAnagrams(String s, String p) {
        int i=0;
        int j=p.length();
        List<Integer> list=new ArrayList<>();
        while(j<=s.length()){
            //System.out.println("Substring >>"+s.substring(i,j));
            if(isAnamgram(s.substring(i,j),p)){
                list.add(i);
            }
            i++;
            j++;
        }
        return list;
    }
    public static boolean isAnamgram(String s,String p){
        HashMap<Character,Integer> map=new HashMap<>();
        if(s.length()!=p.length()) return false;
        for(int i=0;i<s.length();i++){
            char chs=s.charAt(i);
            char chp=p.charAt(i);
            map.put(chs,map.getOrDefault(chs,0)+1);
            map.put(chp,map.getOrDefault(chp,0)-1);
        }
        for(int val:map.values()){
            if(val!=0) return false;
        }
        return true;
    }
}

Upvotes: 1

jbapple
jbapple

Reputation: 3375

// In papers on string searching algorithms, the alphabet is often
// called Sigma, and it is often not considered a constant. Your
// algorthm works in (Sigma * n) time, where n is the length of the
// longer string. Below is an algorithm that works in O(n) time even
// when Sigma is too large to make an array of size Sigma, as long as
// values from Sigma are a constant number of "machine words".

// This solution works in O(n) time "with high probability", meaning
// that for all c > 2 the probability that the algorithm takes more
// than c*n time is 1-o(n^-c). This is a looser bound than O(n)
// worst-cast because it uses hash tables, which depend on randomness.
#include <functional>
#include <iostream>
#include <type_traits>
#include <vector>
#include <unordered_map>
#include <vector>

using namespace std;
// Finding a needle in a haystack. This works for any iterable type
// whose members can be stored as keys of an unordered_map.
template <typename T>
vector<size_t> AnagramLocations(const T& needle, const T& haystack) {
  // Think of a contiguous region of an ordered container as
  // representing a function f with the domain being the type of item
  // stored in the container and the codomain being the natural
  // numbers. We say that f(x) = n when there are n x's in the
  // contiguous region.
  //
  // Then two contiguous regions are anagrams when they have the same
  // function. We can track how close they are to being anagrams by
  // subtracting one function from the other, pointwise. When that
  // difference is uniformly 0, then the regions are anagrams.
  unordered_map<remove_const_t<remove_reference_t<decltype(*needle.begin())>>,
                intmax_t> difference;
  // As we iterate through the haystack, we track the lead (part
  // closest to the end) and lag (part closest to the beginning) of a
  // contiguous region in the haystack. When we move the region
  // forward by one, one part of the function f is increased by +1 and
  // one part is decreased by -1, so the same is true of difference.
  auto lag = haystack.begin(), lead = haystack.begin();
  // To compare difference to the uniformly-zero function in O(1)
  // time, we make sure it does not contain any points that map to
  // 0. The the property of being uniformly zero is the same as the
  // property of having an empty difference.
  const auto find = [&](const auto& x) {
    difference[x]++;
    if (0 == difference[x]) difference.erase(x);
  };
  const auto lose = [&](const auto& x) {
    difference[x]--;
    if (0 == difference[x]) difference.erase(x);
  };
  vector<size_t> result;
  // First we initialize the difference with the first needle.size()
  // items from both needle and haystack.
  for (const auto& x : needle) {
    lose(x);
    find(*lead);
    ++lead;
    if (lead == haystack.end()) return result;
  }
  size_t i = 0;
  if (difference.empty()) result.push_back(i++);
  // Now we iterate through the haystack with lead, lag, and i (the
  // position of lag) updating difference in O(1) time at each spot.
  for (; lead != haystack.end(); ++lead, ++lag, ++i) {
    find(*lead);
    lose(*lag);
    if (difference.empty()) result.push_back(i);
  }
  return result;
}
int main() {
  string needle, haystack;
  cin >> needle >> haystack;
  const auto result = AnagramLocations(needle, haystack);
  for (auto x : result) cout << x << ' ';
}

Upvotes: 1

Daniel
Daniel

Reputation: 7714

lets say p has size n.

lets say you have an array A of size 26 that is filled with the number of a,b,c,... which p contains.

then you create a new array B of size 26 filled with 0.

lets call the given (big) string s.

first of all you initialize B with the number of a,b,c,... in the first n chars of s.

then you iterate through each word of size n in s always updating B to fit this n-sized word.

always B matches A you will have an index where we have an anagram.

to change B from one n-sized word to another, notice you just have to remove in B the first char of the previous word and add the new char of the next word.

Look at the example:

Input
s: "cbaebabacd" 
p: "abc"          n = 3 (size of p)

A = {1, 1, 1, 0, 0, 0, ... }  // p contains just 1a, 1b and 1c.

B = {1, 1, 1, 0, 0, 0, ... }  // initially, the first n-sized word contains this.

compare(A,B)

for i = n; i < size of s; i++ {
    B[ s[i-n] ]--;
    B[ s[ i ] ]++;
    compare(A,B)
}

and suppose that compare(A,B) prints the index always A matches B.

the total complexity will be:

first fill of A  = O(size of p)
first fill of B  = O(size of s)
first comparison = O(26)
for-loop = |s| * (2 + O(26)) = |s| * O(28) = O(28|s|) = O(size of s)
____________________________________________________________________
2 * O(size of s) + O(size of p) + O(26)

which is linear in size of s.

Upvotes: 2

user3386109
user3386109

Reputation: 34829

Your solution is the O(n) solution. The size of the s_map and p_map vectors is a constant (26) that doesn't depend on n. So the comparison between s_map and p_map takes a constant amount of time regardless of how big n is.

Your solution takes about 26 * n integer comparisons to complete, which is O(n).

Upvotes: 1

Related Questions