How to vectorize python for loop that modifies each element of a dataframe?

Question

I have a Python script, using pandas dataframes, that fills a dataframe by converting the elements of another dataframe. I could do it with a simple for loop or itertuples, but I wanted to see if it was possible to vectorize it for maximum speed (my dataframe is very large, ~60000x12000).

Here is an example of what I'm trying to do:

    #Sample data
    sample_list=[1,2,5]

I have a list of values like the one above. Each element in my new matrix is the sum of certain two elements from this list divided by a constant number n.

new_matrix[row,col]=(sample_list[row]+sample_list[col])/n

So the expected output for n=2 would be:

1   1.5 3
1.5 2   3.5
3   3.5 5

Right now I execute this with a for loop, iterating across each element of an empty matrix and setting them to the value calculated by the formula. Is there any way this operation could be vectorized (i.e. something like new_matrix=2*old_matrix rather than for row, col in range(): new_matrix[row,col]=2*old_matrix[row,col]?

John Zwinck · Accepted Answer

First convert your list to an array:

arr = np.asarray(sample_list)

Then note that your addition needs to broadcast to produce a 2D output. To add a "virtual" dimension to an array, use np.newaxis:

arr[:,np.newaxis] + arr

That gives you:

array([[ 2,  3,  6],
       [ 3,  4,  7],
       [ 6,  7, 10]])

Which is trivially divided by 2 to get the final result.

Doing the other way around is more efficient, as the divisions are in 1D instead of 2D:

arr = np.asarray(sample_list) / 2
arr[:,np.newaxis] + arr

How to vectorize python for loop that modifies each element of a dataframe?

Answers (1)

Related Questions