frazman
frazman

Reputation: 33243

efficienty count the elements in the list

I have a list:

color_list = [    ['black', 'maroon', 'maroon', 'maroon', 'maroon']
['purple', 'black', 'maroon', 'maroon', 'maroon']
['maroon', 'purple', 'maroon', 'teal', 'teal']
['maroon', 'maroon', 'purple', 'maroon', 'maroon']
['teal', 'olive', 'teal', 'maroon', 'maroon']
    ....
 ]

Now, I want to count the following:

How many times maroon and black occurred together

How many times purple and black occurred together

How many times maroon and purple occurred together.

and so on.. The colors in color_list comes from a predefined colors. i.e assume that i have a list of colors ['red','green','teal'....] and I want to basically find the counts by that red and green occur together "n" times in the color_list together.. red and teal occurs together "m" times.. and so on..

and then.. the next step is to find how many times red, green and blue .. (taking 3 at a time)...

what is the best way to implement this in python?

Upvotes: 1

Views: 289

Answers (4)

ElKamina
ElKamina

Reputation: 7807

Your problem is very similar to Association Rule Mining. You should look at: http://orange.biolab.si/doc/ofb/assoc.htm .

Upvotes: 1

georg
georg

Reputation: 214959

You can use collections.Counter:

color_list = [                                        
['black', 'maroon', 'maroon', 'maroon', 'maroon']  ,   
['purple', 'black', 'maroon', 'maroon', 'maroon']  ,   
['maroon', 'purple', 'maroon', 'teal', 'teal']     ,   
['maroon', 'maroon', 'purple', 'maroon', 'maroon'] ,   
['teal', 'olive', 'teal', 'maroon', 'maroon']         
]                                                     

from collections import Counter

cnt = [Counter(x) for x in color_list]

for x, y in [('black', 'maroon'), ('teal', 'olive')]:
    print x, y, sum(min(c[x], c[y]) for c in cnt)

Upvotes: 6

Endophage
Endophage

Reputation: 21473

It sounds like you're really just looking for every color pair combination that can be made from any given list. I may be off but if that is your goal, it's a simple problem. You just need to get the unique items in the set and sum the length of the list - 1. This is a standard solution to finding pairs where order is not important. If you start at the left most element in say a list of 4, index 0. There are 3 items to its right it can be paired with. Move to index 1, we've already counted the pair with index 0 so there are 2 items to its right it can be paired with, and so on. The simple way to do this in Python is just

sum(xrange(0, len(set(colors))-1))

If you have specific colors you need to find pairs of within you arbitrary list, it's similarly simple:

sum(xrange(0, len(set(colors) & set(chosen_colors))-1))

p.s. set instersection kicks ass

Upvotes: 1

Gareth Latty
Gareth Latty

Reputation: 89007

Presuming that you take any number of occurrences in a sublist to mean one 'together':

color_sets = [set(sublist) for sublist in color_list]
looking_for = {"maroon", "black"}
sum(looking_for <= sublist for sublist in sublist)

This works by making your lists into sets, then checking if looking_for is a subset of the sets, summing the result (as True counts as 1 as an integer).

Edit:

Just seen your comment saying you do want the number of occurrences to matter. If that's the case, then the simple adaptation of what I had is:

sum(min(sublist.count(item) for item in looking_for) for sublist in color_list)

However, as list.count() is used so much, this won't be very efficent for larger looking_fors.

Upvotes: 3

Related Questions