Reputation: 33243
I have a list:
color_list = [ ['black', 'maroon', 'maroon', 'maroon', 'maroon']
['purple', 'black', 'maroon', 'maroon', 'maroon']
['maroon', 'purple', 'maroon', 'teal', 'teal']
['maroon', 'maroon', 'purple', 'maroon', 'maroon']
['teal', 'olive', 'teal', 'maroon', 'maroon']
....
]
Now, I want to count the following:
How many times maroon and black occurred together
How many times purple and black occurred together
How many times maroon and purple occurred together.
and so on.. The colors in color_list comes from a predefined colors. i.e assume that i have a list of colors ['red','green','teal'....] and I want to basically find the counts by that red and green occur together "n" times in the color_list together.. red and teal occurs together "m" times.. and so on..
and then.. the next step is to find how many times red, green and blue .. (taking 3 at a time)...
what is the best way to implement this in python?
Upvotes: 1
Views: 289
Reputation: 7807
Your problem is very similar to Association Rule Mining. You should look at: http://orange.biolab.si/doc/ofb/assoc.htm .
Upvotes: 1
Reputation: 214959
You can use collections.Counter:
color_list = [
['black', 'maroon', 'maroon', 'maroon', 'maroon'] ,
['purple', 'black', 'maroon', 'maroon', 'maroon'] ,
['maroon', 'purple', 'maroon', 'teal', 'teal'] ,
['maroon', 'maroon', 'purple', 'maroon', 'maroon'] ,
['teal', 'olive', 'teal', 'maroon', 'maroon']
]
from collections import Counter
cnt = [Counter(x) for x in color_list]
for x, y in [('black', 'maroon'), ('teal', 'olive')]:
print x, y, sum(min(c[x], c[y]) for c in cnt)
Upvotes: 6
Reputation: 21473
It sounds like you're really just looking for every color pair combination that can be made from any given list. I may be off but if that is your goal, it's a simple problem. You just need to get the unique items in the set and sum the length of the list - 1. This is a standard solution to finding pairs where order is not important. If you start at the left most element in say a list of 4, index 0. There are 3 items to its right it can be paired with. Move to index 1, we've already counted the pair with index 0 so there are 2 items to its right it can be paired with, and so on. The simple way to do this in Python is just
sum(xrange(0, len(set(colors))-1))
If you have specific colors you need to find pairs of within you arbitrary list, it's similarly simple:
sum(xrange(0, len(set(colors) & set(chosen_colors))-1))
p.s. set
instersection kicks ass
Upvotes: 1
Reputation: 89007
Presuming that you take any number of occurrences in a sublist to mean one 'together':
color_sets = [set(sublist) for sublist in color_list]
looking_for = {"maroon", "black"}
sum(looking_for <= sublist for sublist in sublist)
This works by making your lists into sets, then checking if looking_for
is a subset of the sets, summing the result (as True
counts as 1
as an integer).
Just seen your comment saying you do want the number of occurrences to matter. If that's the case, then the simple adaptation of what I had is:
sum(min(sublist.count(item) for item in looking_for) for sublist in color_list)
However, as list.count()
is used so much, this won't be very efficent for larger looking_for
s.
Upvotes: 3