Reputation: 2329
I have a table of sensor data, for which some columns are measurements and some columns are sensor bias. For example, something like this:
df=pd.DataFrame({'x':[1.0,2.0,3.0],'y':[4.0,5.0,6.0],
'dx':[0.25,0.25,0.25],'dy':[0.5,0.5,0.5]})
dx dy x y 0 0.25 0.5 1.0 4.0 1 0.25 0.5 2.0 5.0 2 0.25 0.5 3.0 6.0
I can add a column to the table by subtracting the bias from the measurement like this:
df['newX'] = df['x'] - df['dx']
dx dy x y newX 0 0.25 0.5 1.0 4.0 0.75 1 0.25 0.5 2.0 5.0 1.75 2 0.25 0.5 3.0 6.0 2.75
But I'd like to do that for many columns at once. This doesn't work:
df[['newX','newY']] = df[['x','y']] - df[['dx','dy']]
for two reasons, it seems.
['x', 'y', 'dx', 'dy']
.Obviously I can iterate over the columns and do each one individually, but is there a more compact way to accomplish what I'm trying to do that is more analogous to the one column solution?
Upvotes: 3
Views: 15558
Reputation: 879899
DataFrames generally align operations such as arithmetic on column and row indices. Since df[['x','y']]
and df[['dx','dy']]
have different column names, the dx
column is not subtracted from the x
column, and similiarly for the y
columns.
In contrast, if you subtract a NumPy array from a DataFrame, the operation is done elementwise since the NumPy array has no Panda-style indices to align upon.
Hence, if you use df[['dx','dy']].values
to extract a NumPy array consisting of the values in df[['dx','dy']]
, then your assignment can be done as desired:
import pandas as pd
df = pd.DataFrame({'x':[1.0,2.0,3.0],'y':[4.0,5.0,6.0],
'dx':[0.25,0.25,0.25],'dy':[0.5,0.5,0.5]})
df[['newx','newy']] = df[['x','y']] - df[['dx','dy']].values
print(df)
yields
dx dy x y newx newy
0 0.25 0.5 1.0 4.0 0.75 3.5
1 0.25 0.5 2.0 5.0 1.75 4.5
2 0.25 0.5 3.0 6.0 2.75 5.5
Be ware that if you were to try assigning a NumPy array (on the right-hand side) to a DataFrame (on the left-hand side), the column names specified on the left must already exist.
In contrast, when assigning a DataFrame on the right-hand side to a DataFrame on the left, new columns can be used since in this case Pandas zips the keys (new column names) on the left with the columns on the right and assigns values in column-order instead of by aligning columns:
for k1, k2 in zip(key, value.columns):
self[k1] = value[k2]
Thus, using a DataFrame on the right
df[['newx','newy']] = df[['x','y']] - df[['dx','dy']].values
works, but using a NumPy array on the right
df[['newx','newy']] = df[['x','y']].values - df[['dx','dy']].values
does not.
Upvotes: 12