Philip Engel
Philip Engel

Reputation: 111

Computing with a large data file

I have a very large (say a few thousand) list of partitions, something like:

[[9,0,0,0,0,0,0,0,0], 
[8,1,0,0,0,0,0,0,0], 
..., 
[1,1,1,1,1,1,1,1,1]]

What I want to do is apply to each of them a function (which outputs a small number of partitions), then put all the outputs in a list and remove duplicates.

I am able to do this, but the problem is that my computer gets very slow if I put the above list directly into the python file (esp. when scrolling). What is making it slow? If it is memory being used to load the whole list,

Is there a way to put the partitions in another file, and have the function just read the list term by term?

EDIT: I am adding some code. My code is probably very inefficient because I'm quite an amateur. So what I really have is a list of lists of partitions, that I want to add to:

listofparts3 = [[[3],[2,1],[1,1,1]],
[[6],[5,1],...,[1,1,1,1,1,1]],...]

def addtolist3(n):
    a=int(n/3)-2
    counter = 0
    added = []
    for i in range(len(listofparts3[a])):
        first = listofparts3[a][i]
        if len(first)<n:
            for i in range(n-len(first)):
                first.append(0)
        answer = lowering1(fock(first),-2)[0]
        for j in range(len(answer)):
            newelement = True
            for k in range(len(added)):
                if (answer[j]==added[k]).all():
                    newelement = False
                    break
            if newelement==True:
                added.append(answer[j])
        print(counter)
        counter = counter+1
    for i in range(len(added)):
        added[i]=partition(added[i]).tolist()
    return(added)

fock, lowering1, partition are all functions in earlier code, they are pretty simple functions. The above function, say addtolist(24), takes all the partition of 21 that I have and returns the desired list of partitions of 24, which I can then append to the end of listofparts3.

Upvotes: 1

Views: 69

Answers (1)

Raymond Hettinger
Raymond Hettinger

Reputation: 226181

A few thousand partitions uses only a modest amount of memory, so that likely isn't the source of your problem.

One way to speed-up function application is to use map() for Python 3 or itertools.imap() from Python 2.

The fastest way to eliminate duplicates is to feed them into a Python set() object.

Upvotes: 1

Related Questions