Reputation: 111
Suppose you have an input character stream (StreamReader) and you have a list of pattens like ABCAS, ASGKT. KHTSD etc. You need to read the stream one by one character and you need to keep count on number of each pattern found so far and when EOF occures just print all the patterns along with count.
Example:
Input String: 011100010
Pattern 1: 011
Pattern 2: 010
Output:
011 => 1
010 => 1
Upvotes: 1
Views: 171
Reputation: 4435
There are existing tools for solving this problem. They are called lexers, and on of the most well-known example is lex, mentioned in the comments.
However, if you need to implement it yourself, Aho-Corasick algorithm does the thing. Given a set of strings S = s_1, ..., s_k of total length L, it builds a trie on these strings and makes an automaton based on this trie. Now you feed your input string into this automaton. At any time it is capable of telling you the set of strings from S that end at current position, with time proportional to the number of such strings.
For example, if S = {"aba", "bababa", "ba", "abacaba"} and your have inputted the string rabacaba, the output of the algorithm will be {0, 2, 3} (the indices of the strings which end at the last fed position).
Having such structure, maintaining occurred strings and their count is straightforward: just make a query to the structure after each character of the stream.
Upvotes: 1