Reputation: 19627

What is the most effective way to incremente a large number of values in Python?

Okay, sorry if my problem seems a bit rough. I'll try to explain it in a figurative way, I hope this is satisfactory.

10 children.
5 boxes.
Each child chooses three boxes.
Each box is opened:
- If it contains something, all children selected this box gets 1 point
- Otherwise, nobody gets a point.

My question is about what I put in bold. Because in my code, there are lots of kids and lots of boxes.

Currently, I proceed as follows:

children = {"child_1" : 0, ... , "child_10": 0}

gp1 = ["child_3", "child_7", "child_10"] #children who selected the box 1
...
gp5 = ["child_2", "child_5", "child_8", "child_10"]

boxes = [(0,gp1), (0,gp2), (1,gp3), (1,gp4), (0,gp5)]

for box in boxes:
    if box[0] == 1: #something inside
        for child in box[1]:
            children[child] += 1

I worry mainly about the for loop that assigns each child an extra point. Because in my final code, I have many many children, I fear that doing so would slow the program too.

Is there a more efficient way for all children of the same group may have their point faster?

Upvotes: 10

Answers (6)

FMc

Reputation: 42421

Two general points:

(1) Based on what you've told us, there's no reason to focus your energy on minor performance optimizations. Your time would be better spent thinking about ways to make your data structures less awkward and more communicative. A bunch of interrelated dicts, lists, and tuples quickly becomes difficult to maintain. For an alternative, see the example below.

(2) As the game designer, you understand that events follow a certain sequence: first the kids select their boxes, and later they discover whether they get points for them. But you don't have to implement it that way. A kid can choose a box and get points (or not) immediately. If there's a need to preserve the child's ignorance about such outcomes, the parts of your algorithm that depend on such ignorance can enforce that veil of secrecy as needed. The upshot: there is no need for a box to loop through its children, awarding points to each one; instead, award the points immediately to kids as boxes are selected.

import random

class Box(object):
    def __init__(self, name):
        self.name = name
        self.prize = random.randint(0,1)

class Child(object):
    def __init__(self, name):
        self.name = name
        self.boxes = []
        self.score = 0
        self._score = 0

    def choose(self, n, boxes):
        bs = random.sample(boxes, n)
        for b in bs:
            self.boxes.append(b)
            self._score += b.prize

    def reveal_score(self):
        self.score = self._score

boxes = [Box(i) for i in range(5)]
kids = [Child(i) for i in range(10)]

for k in kids:
    k.choose(3, boxes)

# Later in the game ...
for k in kids:
    k.reveal_score()
    print (k.name, k.score), '=>', [(b.name, b.prize) for b in k.boxes]

Upvotes: 1

nneonneo

Reputation: 179452

Represent children as indices into arrays, not as strings:

childrenScores = [0] * 10
gp1 = [2,6,9] # children who selected box 1
...
gp5 = [1,4,7,9]

boxes = [(0,gp1), (0,gp2), (1,gp3), (1,gp4), (0,gp5)]

Then, you can store childrenScores as a NumPy array and use advanced indexing:
```
childrenScores = np.zeros(10, dtype=int)
...
for box in boxes:
    if box[0]:
        childrenScores[box[1]] += 1 # NumPy advanced indexing
```
This still involves a loop somewhere, but the loop is deep inside NumPy instead, which should provide a meaningful speedup.

Upvotes: 5

Ofir Israel

Reputation: 3913

What I would do

In the gpX lists don't save the "name of the child" (e.g. "child_10") but save a reference to the child's number of points.

How to do that

Using the fact that lists are objects in python, you can:

Change the children dict to look like: children = {"child_0": [0], "child_1": [0], ...} and so on.
When you assign to group, don't assign the key but assign the value (e.g. gp1.append(children["child_0"])).
The loop should then look like: for child in box[1]: child[0]+=1. This WILL update the children dict.

EDIT:

Why this is faster: Because you leave out the part where you search for children[child], which might be costly.

This technique works because by storing the totals in a mutable type, and appending those values to the group lists, both the dict value and each box's list value will point to the same list entries, and changing one will change the other.

Upvotes: 1

Dhruv Kapoor

Reputation: 1081

If you don't need to immediately print the number of points for every child, you could calculate it on demand, thus saving time. This could help if you only need to query a child every now and then for how many points it has. You can cache each result as you obtain is so you don't go about calculating it again the next time you need it.

Firstly, you'll need to know which groups a child belongs to. We'll store this information as map we'll call childToGroupsMap, which will map each child to an array holding the boxes it belongs to, like so:

childToGroupsMap = {}
for child in children:
    childToGroupsMap[child[0]] = []
for box in boxes:
    for child in box[1]:
        if (box[1] not in childToGroupsMap[child]):
            childToGroupsMap[child].append(box[1])

This constructs the reverse map from children to boxes.

It'll also help to have a map from each box to a boolean representing whether it's been opened:

boxToOpenedMap = {}
for box in boxes:
    boxToOpenedMap[box[1]] = box[0]

Now, when someone queries how many points a child has, you can go through each of the boxes it belongs to (using childToGroupsMap, of course), and simply count how many of those boxes have been mapped to 1 in the map boxes:

def countBoxesForChild(child):
    points = 0
    for box in childToGroupsMap[child]
        if boxToOpenedMap[box] == 1:
            points += 1
    return points

To make this better, you can cache the resulting number of points. Make a map like so:

childToPointsCalculated = {}
for child in children:
    childToPointsCalculated[child[0]] = -1

Where the -1 denotes that we don't yet know how many points this child has.

Finally, you can modify your countBoxesForChild function to exploit the cache:

def countBoxesForChild(child):
    if childToPointsCalculated[child] != -1
        return childToPointsCalculated[child]

    points = 0
    for box in childToGroupsMap[child]
        if boxToOpenedMap[box] == 1:
            points += 1
    childToPointsCalculated[child] = points
    return points

Upvotes: 0

fabmilo

Reputation: 48330

The only speed up that I can think of is to use numpy arrays and stream the sum operation.

children[child] += np.ones(len(children[child]))

You should benchmark the operation and see if that is too slow for your business case.

Upvotes: 2

Mark R. Wilkins

Reputation: 1302

One way or another, you're going to be looping over the children, and your answer appears to avoid looping over children who don't get any points.

It might be slightly faster to use filter or itertools.ifilter to pick the boxes that have something in them:

import itertools

...

for box in itertools.ifilter(lambda x: x[0], boxes):
    for child in box[1]
        children[child] += 1

Upvotes: 0

What is the most effective way to incremente a large number of values in Python?

Answers (6)

Related Questions