Reputation: 2270
Consider I have a dataframe as below:
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame([[1, 2], [3, 4]], columns=['f1', 'f2'], index=['r1', 'r2'])
>>> df
f1 f2
r1 1 2
r2 3 4
How should I efficiently calculate the absolute different between row r1
and r2
and create another row as r3
to keep the result. That is to say the result will look like:
>>> for cn in df.columns:
... diff_dat.append(abs(df[cn]['r1'] - df[cn]['r2']))
...
>>> diff_dat
[2, 2]
>>> df.append(pd.DataFrame([diff_dat], index=['r3'], columns=df.columns))
f1 f2
r1 1 2
r2 3 4
r3 2 2
Upvotes: 1
Views: 1888
Reputation: 862751
Use loc
for select rows, subtract, get abs
and last add new row by setting with enlargement
:
df.loc['r3'] = (df.loc['r1'] - df.loc['r2']).abs()
print (df)
f1 f2
r1 1 2
r2 3 4
r3 2 2
Performance for 1000 columns:
np.random.seed(123)
df = pd.DataFrame(np.random.randint(10, size=(2, 1000)), index=['r1', 'r2']).add_prefix('f')-5
#Mayank Porwal solution
In [40]: %timeit df.append(df.diff().dropna().abs())
1.51 ms ± 19.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
#jezrael solution
In [41]: %timeit df.loc['r3'] = (df.loc['r1'] - df.loc['r2']).abs()
663 µs ± 54.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
#NaT3z solution
In [42]: %timeit df.loc["r3"] = df.apply(lambda c: abs(c["r1"] - c["r2"]), axis=0)
967 µs ± 80.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
For improve performance is possible use numpy
:
In [49]: %timeit df.loc['r3'] = np.abs(df.loc['r1'].values - df.loc['r2'].values)
414 µs ± 1.68 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Upvotes: 5
Reputation: 34056
You can do this:
In [576]: df.append(df.diff().dropna().abs())
Out[583]:
f1 f2
r1 1.0 2.0
r2 3.0 4.0
r2 2.0 2.0
Upvotes: 5
Reputation: 344
The easiest solution for this is to use the .loc function which takes row indexes.
(Edited to remove code identical to that written by jezrael)
If you aren't familiar with pandas I'd suggest checking out the DataFrame.apply function since it allows broader manipulations of data (both row-wise and columns-wise). A solution would look like this:
df["r3"] = df.apply(lambda c: abs(c["r1"] - c["r2"]), axis=0)
pandas.DataFrame.apply is a powerful tool, letting you apply functions to the rows or columns in your dataset and taking advantage of pandas vectorisation.
Upvotes: 3