Reputation: 593
My df looks as follows:
Index Country Val1 Val2 ... Val10
1 Australia 1 3 ... 5
2 Bambua 12 33 ... 56
3 Tambua 14 34 ... 58
I'd like to substract Val10 from Val1 for each country, so output looks like:
Country Val10-Val1
Australia 4
Bambua 23
Tambua 24
So far I've got:
def myDelta(row):
data = row[['Val10', 'Val1']]
return pd.Series({'Delta': np.subtract(data)})
def runDeltas():
myDF = getDF() \
.apply(myDelta, axis=1) \
.sort_values(by=['Delta'], ascending=False)
return myDF
runDeltas results in this error:
ValueError: ('invalid number of arguments', u'occurred at index 9')
What's the proper way to fix this?
Upvotes: 34
Views: 189566
Reputation: 2387
Another, more recent, way of doing this, is by using the sub()
-method:
df['Val_1_minus_10'] = df.loc[:,'Val1'].sub(df.loc[:,'Val10'])
From the documentation:
"Equivalent to dataframe - other
, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rsub
."
Upvotes: 0
Reputation: 2784
Given the following dataframe:
import pandas as pd
df = pd.DataFrame([["Australia", 1, 3, 5],
["Bambua", 12, 33, 56],
["Tambua", 14, 34, 58]
], columns=["Country", "Val1", "Val2", "Val10"]
)
It comes down to a simple broadcasting operation:
>>> df["Val1"] - df["Val10"]
0 -4
1 -44
2 -44
dtype: int64
You can also store this into a new column with:
>>> df['Val_1_minus_10'] = df['Val1'] - df['Val10']
>>> df
Country Val1 Val2 Val10 Val_1_minus_10
0 Australia 1 3 5 -4
1 Bambua 12 33 56 -44
2 Tambua 14 34 58 -44
Upvotes: 34
Reputation: 2300
You can also use eval
here:
In [12]: df.eval('Val10_minus_Val1 = Val10-Val1', inplace=True)
In [13]: df
Out[13]:
Country Val1 Val2 Val10 Val10_minus_Val1
0 Australia 1 3 5 4
1 Bambua 12 33 56 44
2 Tambua 14 34 58 44
Since inplace=True
you don't have to assign it back to df
.
Upvotes: 1
Reputation: 4863
Though it's an old question but pandas allows subtracting two DataFrame
s or Series
s using pandas.DataFrame.subtract
import pandas as pd
df = pd.DataFrame([["Australia", 1, 3, 5],
["Bambua", 12, 33, 56],
["Tambua", 14, 34, 58]
], columns=["Country", "Val1", "Val2", "Val10"]
)
df["Val1"].subtract(df["Val2"])
Output:
0 -2
1 -21
2 -20
dtype: int64
Upvotes: 1
Reputation: 4496
You can do this by using lambda function and assign to new column.
df['Val10-Val1'] = df.apply(lambda x: x['Val10'] - x['Val1'], axis=1)
print df
Upvotes: 11
Reputation: 15568
You can also use pandas.DataFrame.assign function: e,g
import numpy as np
import pandas as pd
df = pd.DataFrame([["Australia", 1, 3, 5],
["Bambua", 12, 33, 56],
["Tambua", 14, 34, 58]
], columns=["Country", "Val1", "Val2", "Val10"]
)
df = df.assign(Val10_minus_Val1 = df['Val10'] - df['Val1'])
The best part of assign is you can add as many assignments as you wish. e.g. getting both the difference and then the log of it
df = df.assign(Val10_minus_Val1 = df['Val10'] - df['Val1'], log_result = lambda x: np.log(x.Val10_minus_Val1) )
Upvotes: 8
Reputation: 91
What I have faced today, makes me ambitious to share it with you. As people mentioned above you can used easily:
df['Val10-Val1'] = df['Val10']-df['Val1']
but sometimes you might need to use apply function, so you might use the following line:
df['Val10-Val1'] = df.apply(lambda row: row['Val10']-row['Val1'])
Upvotes: 0
Reputation: 162
Using this as the df:
df = pd.DataFrame([["Australia", 1, 3, 5],
["Bambua", 12, 33, 56],
["Tambua", 14, 34, 58]
], columns=["Country", "Val1", "Val2", "Val10"]
)
You can also do the subtraction and put it into a new column as follows.
>>>df['Val_Diff'] = df['Val10'] - df['Val1']
Country Val1 Val2 Val10 Val_Diff
0 Australia 1 3 5 4
1 Bambua 12 33 56 44
2 Tambua 14 34 58 44
Upvotes: 16