Reputation: 495

Dataframe - how to run calculations without using for loop?

I have a pandas DataFrame

df = pd.DataFrame({"A": [10,20,30], "B": [20, 30, 10], "C": [20, 30, 10]})

df
    A   B  C
0   10  20 20
1   20  30 30
2   30  10 10

and another ndarray w = array([0.2, 0.3, 0.4])

how do I add column D such that its value is dot product of each row and w

i.e. the value for D[0] will be np.dot(df.iloc[0],w) = 16

likewise, value for D[1] is 25 (np.dot(df.iloc[1],w) = 25.

(I am thinking apply() function but not sure how to use it, using for loop might be inefficient)

thanks,

Upvotes: 1

Answers (2)

FBruzzesi

Reputation: 6485

You can also use a vectorize approach exploiting numpy broadcast:

df['D'] = np.sum(df.to_numpy() * w), axis=1)
'''
.to_numpy() is from version 0.24 if I remember correctly, before use .values
'''

df
    A   B   C     D
0  10  20  20  16.0
1  20  30  30  25.0
2  30  10  10  13.0

Doing perfomance analysis in spyder editor using %timeit, here what I got ordered from slowest to fastest:

%timeit (df * w).sum(axis=1)
2.15 ms ± 590 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit df.apply(lambda p: np.dot(p.values, w), axis=1)
900 µs ± 76.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit np.sum((df.to_numpy() * w), axis=1)
19.2 µs ± 481 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Upvotes: 2

MkWTF

Reputation: 1372

You can do that by using the apply over rows (axis = 1) from pandas.DataFrame

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({"A": [10,20,30], "B": [20, 30, 10], "C": [20, 30, 10]})
>>> w = np.array([0.2, 0.3, 0.4])
>>> df["D"] = df.apply(lambda p: np.dot(p.values, w), axis=1)
>>> df
    A   B   C     D
0  10  20  20  16.0
1  20  30  30  25.0
2  30  10  10  13.0

Although, for efficiency sake, you probably are better off turning the dataframe into a ndarray, and use matrix multiplication with matmul from numpy.

df["D"] = np.matmul(df.values, w)

Upvotes: 4

Dataframe - how to run calculations without using for loop?

Answers (2)

Related Questions