Reputation: 13969
I would like to subtract all rows in a dataframe with one row from another dataframe. (Difference from one row)
Is there an easy way to do this? Like df-df2
)?
df = pd.DataFrame(abs(np.floor(np.random.rand(3, 5)*10)),
... columns=['a', 'b', 'c', 'd', 'e'])
df
Out[18]:
a b c d e
0 8 9 8 6 4
1 3 0 6 4 8
2 2 5 7 5 6
df2 = pd.DataFrame(abs(np.floor(np.random.rand(1, 5)*10)),
... columns=['a', 'b', 'c', 'd', 'e'])
df2
a b c d e
0 8 1 3 7 5
Here is an output that works for the first row, however I want the remaining rows to be detracted as well...
df-df2
a b c d e
0 0 8 5 -1 -1
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
Upvotes: 11
Views: 19174
Reputation: 370
Alternatively you could simply use the apply function on all rows of df.
df3 = df.apply(lambda x: x-df2.squeeze(), axis=1)
# axis=1 because it should apply to rows instead of columns
# squeeze because we would like to substract Series
Upvotes: 1
Reputation: 128958
You can do this directly in pandas as well. (I used df2 = df.loc[[0]]
)
In [80]: df.sub(df2,fill_value=0)
Out[80]:
a b c d e
0 0 0 0 0 0
1 7 6 0 7 8
2 4 4 3 6 2
[3 rows x 5 columns]
Upvotes: 6
Reputation: 879661
Pandas NDFrames generally try to perform operations on items with matching indices. df - df2
only performs subtraction on the first row, because the 0
indexed row is the only row with an index shared in common.
The operation you are looking for looks more like a NumPy array operation performed with "broadcasting":
In [21]: df.values-df2.values
Out[21]:
array([[ 0, 8, 5, -1, -1],
[-5, -1, 3, -3, 3],
[-6, 4, 4, -2, 1]], dtype=int64)
To package the result in a DataFrame:
In [22]: pd.DataFrame(df.values-df2.values, columns=df.columns)
Out[22]:
a b c d e
0 0 8 5 -1 -1
1 -5 -1 3 -3 3
2 -6 4 4 -2 1
Upvotes: 18