Mathieu Dhondt
Mathieu Dhondt

Reputation: 8914

Finding most frequent combinations of numbers

I have a list of 1000s of 7-number sequences and I want to know which combination of numbers are most frequent, ranging from 2 to 7 numbers.

So, for instance, in this list:

1, 2, 3, 4, 5, 6, 7
1, 2, 4, 5, 6, 8, 9
1, 2, 9, 10, 12, 15, 27

[1, 2] would be the highest scoring sequence in the 2-number category [1, 2, 4] would be that for the 3-number category etc.

I have a feeling numpy or another framework could help me with this but I don't have any grasp of statistics and I lack the necessary vocabulary to describe and hence find what I want.

Thanks in advance!

Upvotes: 0

Views: 712

Answers (1)

florex
florex

Reputation: 963

You can use a data mining approach in order to achieve your goal: It is called frequent itemset mining.

Indeed, assuming that :

1, 2, 3, 4, 5, 6, 7
1, 2, 4, 5, 6, 8, 9
1, 2, 9, 10, 12, 15, 27

is your transactions database, where a transaction is a row (for instance : 1, 2, 3, 4, 5, 6, 7), and a transaction contains items which are integers in your case. The goal is then to determine the most frequent itemsets (ie sets of items/integers which occure the most among the transaction database). pymining is a python library for achieving this kind of task (https://github.com/bartdag/pymining)

Upvotes: 1

Related Questions