Sijan Shrestha
Sijan Shrestha

Reputation: 2266

how to fix memory issue if the range is in millions?

I ran into a bit of issue. I am trying to find a solution to fix when i send a large array

The function here basically takes list of arrays and outputs list of arrays.

a=[[...],[...]
b=[[...],[...],..]
op=[[...],[...],..]

Things are smooth for few array with fewer items but things start to heat up if the arrange range are in millions.

def get_list (a,b):
    mg = [] ## merge all data in "a"
    ng = [] ## merge all data in "b"
    for out in a: ##function to merge data in a
        for i in out:
            if i in mg:
                continue
            else:
                mg.append(i)
    for out in b: ##function to merge data in b
        for i in out:
            if i in ng:
                continue
            else:
                ng.append(i)
    ng = sorted(ng) ## sort out the value in a
    mg = sorted(mg) ## sort out the value in b
    op = []
    z =[]
    for m in mg: ## some simple logic that breaks array and creates a new one for output
        if m in ng: 
            if len(z) !=0:
                op.append(z)
                z =[]
            else:
                continue
        else:
            z.append(m)
    op.append(z)
    print("##"*20)
    print(op)
    return op

The following works:

get_list ([[1,2,3,4,5],[2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]],[[3,4,5,6,7]])

But the following will use up all the memory and slow down the system:

a = [x for x in range(1,7+1,1)]
b = [x for x in range(5,20+1,1)]
c = [x for x in range(25,1000000000+1,1)]
n = [x for x in range(6,9+1,1)]
n2 = [x for x in range(8,11+1,1)]
n3 = [x for x in range(30,50+1,1)]

get_list([a,b,c],[n,n2,n3])

Is there a better way to improve the code ? Any suggestion or advice is much appreciated !

Upvotes: 0

Views: 56

Answers (1)

silgon
silgon

Reputation: 7191

Your two for cycles with the if can be translated as follows:

result = set([item for sublist in l for item in sublist])

The code between brackets will flatten your 2D array to 1D. In this code l will be replace by your variable a and b. Once you have a 1D array the set function will return the unique values.

So far I just gave you a one liner for your process, which should help you a little bit with the performance. However loops and nested loops are the weakness of python. If you had a matrix (not array of array), you could transform it to a numpy array and use the unique which would perform way faster because numpy calls a c compiled function.

Upvotes: 1

Related Questions