Reputation: 1639
I have a dataframe with values like
A B
1 4
2 6
3 9
I need to add a new column by adding values from column A and B, like
A B C
1 4 5
2 6 8
3 9 12
I believe this can be done using lambda function, but I can't figure out how to do it.
Upvotes: 100
Views: 343114
Reputation: 2290
eval
lets you sum and create columns right away:
In [8]: df.eval('C = A + B', inplace=True)
In [9]: df
Out[9]:
A B C
0 1 4 5
1 2 6 8
2 3 9 12
Since inplace=True
you don't need to assign it back to df
.
Upvotes: 0
Reputation: 828
You could do:
df['C'] = df.sum(axis=1)
If you only want to do numerical values:
df['C'] = df.sum(axis=1, numeric_only=True)
The parameter axis
takes as arguments either 0
or 1
, with 0
meaning to sum across columns and 1
across rows.
Upvotes: 17
Reputation: 4420
Can do using loc
In [37]: df = pd.DataFrame({"A":[1,2,3],"B":[4,6,9]})
In [38]: df
Out[38]:
A B
0 1 4
1 2 6
2 3 9
In [39]: df['C']=df.loc[:,['A','B']].sum(axis=1)
In [40]: df
Out[40]:
A B C
0 1 4 5
1 2 6 8
2 3 9 12
Upvotes: 4
Reputation: 341
Concerning n00b's comment: "I get the following warning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead"
I was getting the same error. In my case it was because I was trying to perform the column addition on a dataframe that was created like this:
df_b = df[['colA', 'colB', 'colC']]
instead of:
df_c = pd.DataFrame(df, columns=['colA', 'colB', 'colC'])
df_b is a copy of a slice from df
df_c is an new dataframe. So
df_c['colD'] = df['colA'] + df['colB']+ df['colC']
will add the columns and won't raise any warning. Same if .sum(axis=1) is used.
Upvotes: 4
Reputation: 160
I wanted to add a comment responding to the error message n00b was getting but I don't have enough reputation. So my comment is an answer in case it helps anyone...
n00b said:
I get the following warning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
He got this error because whatever manipulations he did to his dataframe prior to creating df['C']
created a view into the dataframe rather than a copy of it. The error didn't arise form the simple calculation df['C'] = df['A'] + df['B']
suggested by DeepSpace.
Have a look at the Returning a view versus a copy docs.
Upvotes: 3
Reputation: 11460
Building a little more on Anton's answer, you can add all the columns like this:
df['sum'] = df[list(df.columns)].sum(axis=1)
Upvotes: 88
Reputation: 5532
As of Pandas version 0.16.0 you can use assign
as follows:
df = pd.DataFrame({"A": [1,2,3], "B": [4,6,9]})
df.assign(C = df.A + df.B)
# Out[383]:
# A B C
# 0 1 4 5
# 1 2 6 8
# 2 3 9 12
You can add multiple columns this way as follows:
df.assign(C = df.A + df.B,
Diff = df.B - df.A,
Mult = df.A * df.B)
# Out[379]:
# A B C Diff Mult
# 0 1 4 5 3 4
# 1 2 6 8 4 12
# 2 3 9 12 6 27
Upvotes: 17
Reputation: 31662
You could use sum
function to achieve that as @EdChum mentioned in the comment:
df['C'] = df[['A', 'B']].sum(axis=1)
In [245]: df
Out[245]:
A B C
0 1 4 5
1 2 6 8
2 3 9 12
Upvotes: 38
Reputation: 797
The simplest way would be to use DeepSpace answer. However, if you really want to use an anonymous function you can use apply:
df['C'] = df.apply(lambda row: row['A'] + row['B'], axis=1)
Upvotes: 56