Elliot Roberts
Elliot Roberts

Reputation: 940

More memory efficient way of making a dictionary?

VERY sorry for the vagueness, but I don't actually know what part of what I'm doing is inefficient.

I've made a program that takes a list of positive integers (example*):

[1, 1, 3, 5, 16, 2, 4, 6, 6, 8, 9, 24, 200,]

*the real lists can be up to 2000 in length and the elements between 0 and 100,000 exclusive

And creates a dictionary where each number tupled with its index (like so: (number, index)) is a key and the value for each key is a list of every number (and that number's index) in the input that it goes evenly into.

So the entry for the 3 would be: (3, 2): [(16, 4), (6, 7), (6, 8), (9, 10), (24, 11)]

My code is this:

num_dict = {}
sorted_list = sorted(beginning_list)

for a2, a in enumerate(sorted_list):
    num_dict[(a, a2)] = []

for x2, x in enumerate(sorted_list):
    for y2, y in enumerate(sorted_list[x2 + 1:]):
        if y % x == 0:
            pair = (y, y2 + x2 + 1)
            num_dict[(x, x2)].append(pair)

But, when I run this script, I hit a MemoryError.

I understand that this means that I am running out of memory but in the situation I'm in, adding more ram or updating to a 64-bit version of python is not an option.

I am certain that the problem is not coming from the list sorting or the first for loop. It has to be the second for loop. I just included the other lines for context.

The full output for the list above would be (sorry for the unsortedness, that's just how dictionaries do):

(200, 12): []
(6, 7): [(24, 11)]
(16, 10): []
(6, 6): [(6, 7), (24, 11)]
(5, 5): [(200, 12)]
(4, 4): [(8, 8), (16, 10), (24, 11), (200, 12)]
(9, 9): []
(8, 8): [(16, 10), (24, 11), (200, 12)]
(2, 2): [(4, 4), (6, 6), (6, 7), (8, 8), (16, 10), (24, 11), (200, 12)]
(24, 11): []
(1, 0): [(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (6, 7), (8, 8), (9, 9), (16, 10), (24, 11), (200, 12)]
(1, 1): [(2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (6, 7), (8, 8), (9, 9), (16, 10), (24, 11), (200, 12)]
(3, 3): [(6, 6), (6, 7), (9, 9), (24, 11)]

Is there a better way of going about this?

EDIT:

This dictionary will then be fed into this:

ans_set = set()
for x in num_dict:
    for y in num_dict[x]:
        for z in num_dict[y]:
            ans_set.add((x[0], y[0], z[0]))
return len(ans_set)

to find all unique possible triplets in which the 3rd value can be evenly divided by the 2nd value which can be evenly divided by the 1st.

If you think you know of a better way of doing the entire thing, I'm open to redoing the whole of it.

Final Edit

I've found the best way to find the number of triples by reevaluating what I needed it to do. This method doesn't actually find the triples, it just counts them.

def foo(l):
    llen = len(l)
    total = 0
    cache = {}
    for i in range(llen):
        cache[i] = 0
    for x in range(llen):
        for y in range(x + 1, llen):
            if l[y] % l[x] == 0:
                cache[y] += 1
                total += cache[x]
    return total

And here's a version of the function that explains the thought process as it goes (not good for huge lists though because of spam prints):

def bar(l):
    list_length = len(l)
    total_triples = 0
    cache = {}
    for i in range(list_length):
        cache[i] = 0
    for x in range(list_length):
        print("\n\nfor index[{}]: {}".format(x, l[x]))
        for y in range(x + 1, list_length):
            print("\n\ttry index[{}]: {}".format(y, l[y]))
            if l[y] % l[x] == 0:
                print("\n\t\t{} can be evenly diveded by {}".format(l[y], l[x]))
                cache[y] += 1
                total_triples += cache[x]
                print("\t\tcache[{0}] is now {1}".format(y, cache[y]))
                print("\t\tcount is now {}".format(total_triples))
                print("\t\t(+{} from cache[{}])".format(cache[x], x))
            else:
                print("\n\t\tfalse")
    print("\ntotal number of triples:", total_triples)

Upvotes: 3

Views: 197

Answers (2)

tdelaney
tdelaney

Reputation: 77347

You rebuild tuples in places like pair = (y, y2 + x2 + 1) and num_dict[(x, x2)].append(pair) when you could build a canonical set of tuples early on and then just put references in the containers. I cobbled up a 2000 item test my machine that works. I have python 3.4 64 bit with a relatively modest 3.5 GIG of RAM...

import random

# a test list that should generate longish lists
l = list(random.randint(0, 2000) for _ in range(2000))

# setup canonical index and sort ascending
sorted_index = sorted((v,i) for i,v in enumerate(l))

num_dict = {}
for idx, vi in enumerate(sorted_index):
    v = vi[0]
    num_dict[vi] = [vi2 for vi2 in sorted_index[idx+1:] if not vi2[0] % v]

for item in num_dict.items():
    print(item)

Upvotes: 2

paxdiablo
paxdiablo

Reputation: 881523

Well, you could start by not unnecessarily duplicating information.

Storing full tuples (number and index) for each multiple is inefficient when you already have that information available.

For example, rather than:

(3, 2): [(16, 4), (6, 7), (6, 8), (9, 10), (24, 11)]

(the 16 appears to be wrong there as it's not a multiple of 3 so I'm guessing you meant 15) you could instead opt for:

(3, 2): [15, 6, 9, 24]
(6, 7): ...

That pretty much halves your storage needs since you can go from the 6 in the list and find all its indexes by searching the tuples. That will, of course, be extra processing effort to traverse the list but it's probably better to have a slower working solution than a faster non-working one :-)


You could reduce the storage even more by not storing the multiples at all, instead running through the tuple list using % to see if you have a multiple.


But, of course, this all depends on your actual requirements which would be better off stating the intent of what your trying to achieve rather than pre-supposing a solution.

Upvotes: 2

Related Questions