Reputation: 103
I have an array of values called MyFruits like:
[apple, orange, banana, apple, pear]
Then I have a list of arrays like:
[apple, orange]
[blueberry, watermelon, pear]
[grape, orange, grape, orange]
[]
[cantaloupe]
For each of the arrays in the list, I want to get the count of elements that intersect with MyFruits array divided by the total number of elements in the array. So the output would be:
2 / 2 = 1
1 / 3 = 0.66667
2 / 4 = 0.5
0 / 0 = (in this case 0)
0 / 1 = 0
essentially:
[1, 0.66667, 0.5, 0, 0]
I've been doing this in Python with for loops, but the data set is huge and it's incredibly slow. Someone suggested using numpy, but I'm having difficulty understanding.
Upvotes: 3
Views: 353
Reputation: 3256
Suppose you have two list, one of length M and another of length N. If done by straightforward linear searches, it would take O(M * N) string comparisons to find which elements are in both lists.
You can improve on that using Python sets. Convert the lists to Python sets and use set intersection (&) to find their common elements. Then the complexity reduces to O(M + N).
Upvotes: 1
Reputation: 2756
Is this any better than what you have, or the same?
ratios = []
for d in data:
count = 0
for fruit in myFruits:
count += d.count(fruit)
ratio = count / (len(d) or 1)
ratios.append(ratio)
Executable in replit: https://repl.it/@ToniAlatalo/CountOccurrences
I don't think numpy can help here, it's for numerical processing, but maybe there is a good way to write what you need otherwise.
Upvotes: 0