Reputation: 8914
I have a list of 1000s of 7-number sequences and I want to know which combination of numbers are most frequent, ranging from 2 to 7 numbers.
So, for instance, in this list:
1, 2, 3, 4, 5, 6, 7
1, 2, 4, 5, 6, 8, 9
1, 2, 9, 10, 12, 15, 27
[1, 2]
would be the highest scoring sequence in the 2-number category
[1, 2, 4]
would be that for the 3-number category
etc.
I have a feeling numpy or another framework could help me with this but I don't have any grasp of statistics and I lack the necessary vocabulary to describe and hence find what I want.
Thanks in advance!
Upvotes: 0
Views: 712
Reputation: 963
You can use a data mining approach in order to achieve your goal: It is called frequent itemset mining.
Indeed, assuming that :
1, 2, 3, 4, 5, 6, 7
1, 2, 4, 5, 6, 8, 9
1, 2, 9, 10, 12, 15, 27
is your transactions database, where a transaction is a row (for instance : 1, 2, 3, 4, 5, 6, 7), and a transaction contains items which are integers in your case. The goal is then to determine the most frequent itemsets (ie sets of items/integers which occure the most among the transaction database). pymining is a python library for achieving this kind of task (https://github.com/bartdag/pymining)
Upvotes: 1