Reputation: 625
This question is related to the following questions:
My problem is:
Scenario:
I have a set of numbers
, EACH COMBINATION IS UNIQUE
in this set and each number in the combination appears only once:
Goal: Find frequency of appears of combination (size of 2) in this set.
Example:
The frequency threshold is 2.
Set = {1,12,13,134,135,235,2345,12345}
The frequency of degree of 2 combination is(show all combinations that appears more than 2 times):
13 - appear 4 times
14 - appear 3 times
23 - appear 3 times
12 - appear 2 times
...
The time complexity of exhaustive searching for all possible combinations grow exponentially.
Can any one help me to think a algorithm that can solve this problem faster? (hash table, XOR, tree search....)
Thank you
PS. Don't worry about the space complexity
Solution and conclusion:
templatetypedef
's answer is good for substring' length more than 3
If substring's length is 2, btilly
's answer is straight forward and easy to implement (also have a good performance on time)
Upvotes: 2
Views: 1550
Reputation: 46507
Here is pseudo-code whose running time should be O(n * m * m)
where n
is the size of the set, and m
is the size of the things in that set:
let counts be a hash mapping a pair of characters to a count
foreach number N in list:
foreach pair P of characters in N:
if exists counts[P]:
counts[P] = counts[P] + 1
else:
counts[P] = 1
let final be an array of (pair, count)
foreach P in keys of counts:
if 1 < counts[P]:
add (P, counts[P]) to final
sort final according to the counts
output final
@templatetypedef's answer is going to eventually be more efficient if you're looking for combinations of 3, 4, etc characters. But this should be fine for the stated problem.
Upvotes: 2
Reputation: 373472
You can view this problem as a string problem: given a collection of strings, return all substrings of the collection that appear at least k times. Fortunately, there's a polynomial-time algorithm for this problem That uses generalized suffix trees.
Start by constructing a generalized suffix tree for the string representations of your numbers, which takes time linear in the number of digits across all numbers. Then, do a DFS and annotate each node with the number of leaf nodes in its subtree (equivalently, the number of times the string represented by the node appears in the input set), and in the course of doing so output each string discovered this way to appear at least k times. The runtime for this operation is O(d + z), where d is the number of total digits in the input and z is the total number of digits produced as output.
Hope this helps!
Upvotes: 1