Charles Chow
Charles Chow

Reputation: 625

Find most frequent combination of numbers in a set

This question is related to the following questions:

How to find most frequent combinations of numbers in a list

Most frequently occurring combinations

My problem is:

Scenario: I have a set of numbers, EACH COMBINATION IS UNIQUE in this set and each number in the combination appears only once:

Goal: Find frequency of appears of combination (size of 2) in this set.

Example:

The frequency threshold is 2.

Set = {1,12,13,134,135,235,2345,12345}

The frequency of degree of 2 combination is(show all combinations that appears more than 2 times):

13 - appear 4 times

14 - appear 3 times

23 - appear 3 times

12 - appear 2 times

...

The time complexity of exhaustive searching for all possible combinations grow exponentially.

Can any one help me to think a algorithm that can solve this problem faster? (hash table, XOR, tree search....)

Thank you

PS. Don't worry about the space complexity

Solution and conclusion:

templatetypedef's answer is good for substring' length more than 3

If substring's length is 2, btilly's answer is straight forward and easy to implement (also have a good performance on time)

Upvotes: 2

Views: 1550

Answers (2)

btilly
btilly

Reputation: 46507

Here is pseudo-code whose running time should be O(n * m * m) where n is the size of the set, and m is the size of the things in that set:

let counts be a hash mapping a pair of characters to a count
foreach number N in list:
    foreach pair P of characters in N:
        if exists counts[P]:
            counts[P] = counts[P] + 1
        else:
            counts[P] = 1
 let final be an array of (pair, count)
 foreach P in keys of counts:
     if 1 < counts[P]:
         add (P, counts[P]) to final
 sort final according to the counts
 output final

@templatetypedef's answer is going to eventually be more efficient if you're looking for combinations of 3, 4, etc characters. But this should be fine for the stated problem.

Upvotes: 2

templatetypedef
templatetypedef

Reputation: 373472

You can view this problem as a string problem: given a collection of strings, return all substrings of the collection that appear at least k times. Fortunately, there's a polynomial-time algorithm for this problem That uses generalized suffix trees.

Start by constructing a generalized suffix tree for the string representations of your numbers, which takes time linear in the number of digits across all numbers. Then, do a DFS and annotate each node with the number of leaf nodes in its subtree (equivalently, the number of times the string represented by the node appears in the input set), and in the course of doing so output each string discovered this way to appear at least k times. The runtime for this operation is O(d + z), where d is the number of total digits in the input and z is the total number of digits produced as output.

Hope this helps!

Upvotes: 1

Related Questions