python optimise a nested for loop with append

I have 2 for loops which would run for a large data mostly. I want to optimise this and improve the speed as much as possible.

source = [['row1', 'row2', 'row3'],['Product', 'Cost', 'Quantity'],['Test17', '3216', '17'], ['Test18' , '3217' , '18' ], ['Test19', '3218', '19' ], ['Test20', '3219', '20']]

creating a generator object

it = iter(source)
variables = ['row2', 'row3']
variables_indices = [1, 2]
getkey = rowgetter(*key_indices)
for row in it:
    k = getkey(row)
    for v, i in zip(variables, variables_indices):
        try:
            o = list(k)  # populate with key values initially
            o.append(v)  # add variable
            o.append(row[i]) # add value
            yield tuple(o)
        except IndexError:
            pass

def rowgetter(*indices):
    if len(indices) == 0:
        #print("STEP 7")
        return lambda row: tuple()
    elif len(indices) == 1:
        #print("STEP 7")
        # if   only one index, we cannot use itemgetter, because we want a
        # singleton sequence to be returned, but itemgetter with a single
        # argument returns the value itself, so let's define a function
        index = indices[0]
        return lambda row: (row[index],) 

    else:

        return operator.itemgetter(*indices)

This would return a tuple but it is taking so much time on an average 100 seconds for 100,000 rows (source has 5 rows in the example ). Can anyone help to reduce this timing please.

note : I also tried for inline loops and list comprehension which is not returning for each iteration

Upvotes: 4

Views: 1034

Answers (2)

user2390182
user2390182

Reputation: 73460

Some improvements are marked below, but they do not change the algorithmic complexity:

zipped = list(zip(variables, variables_indices))  # create once and reuse

for row in it:
    for v in zipped:
        try:
            yield (*getkey(row), v, row[i])  # avoid building list and tuple conversion 
        except IndexError:
            pass

Upvotes: 2

Jean-François Fabre
Jean-François Fabre

Reputation: 140178

Creating a list out of k then appending 2 items then converting to tuple creates a lot of copies.

I'd propose an helper function with a generator to yield from k list then yield the remaining elements. Wrap that in a tuple to create a ready to use function:

k = [1,2,3,4]

def make_tuple(k,a,b):
    def gen(k,a,b):
        yield from k
        yield a
        yield b
    return tuple(gen(k,a,b))

result = make_tuple(k,12,14)

output:

(1, 2, 3, 4, 12, 14)

Upvotes: 1

Related Questions