Reputation: 495
I have a pandas DataFrame
df = pd.DataFrame({"A": [10,20,30], "B": [20, 30, 10], "C": [20, 30, 10]})
df
A B C
0 10 20 20
1 20 30 30
2 30 10 10
and another ndarray w = array([0.2, 0.3, 0.4])
how do I add column D
such that its value is dot product of each row and w
i.e. the value for D[0]
will be np.dot(df.iloc[0],w) = 16
likewise, value for D[1]
is 25 (np.dot(df.iloc[1],w) = 25
.
(I am thinking apply()
function but not sure how to use it, using for loop might be inefficient)
thanks,
Upvotes: 1
Views: 513
Reputation: 6485
You can also use a vectorize approach exploiting numpy broadcast:
df['D'] = np.sum(df.to_numpy() * w), axis=1)
'''
.to_numpy() is from version 0.24 if I remember correctly, before use .values
'''
df
A B C D
0 10 20 20 16.0
1 20 30 30 25.0
2 30 10 10 13.0
Doing perfomance analysis in spyder editor using %timeit
, here what I got ordered from slowest to fastest:
%timeit (df * w).sum(axis=1)
2.15 ms ± 590 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit df.apply(lambda p: np.dot(p.values, w), axis=1)
900 µs ± 76.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit np.sum((df.to_numpy() * w), axis=1)
19.2 µs ± 481 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Upvotes: 2
Reputation: 1372
You can do that by using the apply
over rows (axis = 1
) from pandas.DataFrame
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({"A": [10,20,30], "B": [20, 30, 10], "C": [20, 30, 10]})
>>> w = np.array([0.2, 0.3, 0.4])
>>> df["D"] = df.apply(lambda p: np.dot(p.values, w), axis=1)
>>> df
A B C D
0 10 20 20 16.0
1 20 30 30 25.0
2 30 10 10 13.0
Although, for efficiency sake, you probably are better off turning the dataframe into a ndarray, and use matrix multiplication with matmul
from numpy.
df["D"] = np.matmul(df.values, w)
Upvotes: 4