Michael
Michael

Reputation: 13934

Applying function with multiple arguments to create a new pandas column

I want to create a new column in a pandas data frame by applying a function to two existing columns. Following this answer I've been able to create a new column when I only need one column as an argument:

import pandas as pd
df = pd.DataFrame({"A": [10,20,30], "B": [20, 30, 10]})

def fx(x):
    return x * x

print(df)
df['newcolumn'] = df.A.apply(fx)
print(df)

However, I cannot figure out how to do the same thing when the function requires multiple arguments. For example, how do I create a new column by passing column A and column B to the function below?

def fxy(x, y):
    return x * y

Upvotes: 289

Views: 405584

Answers (7)

Luca Clissa
Luca Clissa

Reputation: 918

The answers focus on functions that takes the dataframe's columns as inputs. More in general, if you want to use pandas .apply on a function with multiple arguments, some of which may not be columns, then you can specify them as keyword arguments inside .apply() call:

def fxy(x, y):
    return x * y

df['newcolumn'] = df.A.apply(fxy, y=df.B)
df['newcolumn1'] = df.A.apply(fxy, y=4)

Upvotes: 4

Babatunde Mustapha
Babatunde Mustapha

Reputation: 2663

This will dynamically give you desired result. It works even if you have more than two arguments.

df['anothercolumn'] = df[['A', 'B']].apply(lambda x: fxy(*x), axis=1)
print(df)


    A   B  newcolumn  anothercolumn
0  10  20        100            200
1  20  30        400            600
2  30  10        900            300

Upvotes: 4

toto_tico
toto_tico

Reputation: 19047

If you need to create multiple columns at once:

  1. Create the dataframe:

    import pandas as pd
    df = pd.DataFrame({"A": [10,20,30], "B": [20, 30, 10]})
    
  2. Create the function:

    def fab(row):                                                  
        return row['A'] * row['B'], row['A'] + row['B']
    
  3. Assign the new columns:

    df['newcolumn'], df['newcolumn2'] = zip(*df.apply(fab, axis=1))
    

Upvotes: 52

Surya Chhetri
Surya Chhetri

Reputation: 11588

One more dict style clean syntax:

df["new_column"] = df.apply(lambda x: x["A"] * x["B"], axis = 1)

or,

df["new_column"] = df["A"] * df["B"]

Upvotes: 18

alko
alko

Reputation: 48397

Alternatively, you can use numpy underlying function:

>>> import numpy as np
>>> df = pd.DataFrame({"A": [10,20,30], "B": [20, 30, 10]})
>>> df['new_column'] = np.multiply(df['A'], df['B'])
>>> df
    A   B  new_column
0  10  20         200
1  20  30         600
2  30  10         300

or vectorize arbitrary function in general case:

>>> def fx(x, y):
...     return x*y
...
>>> df['new_column'] = np.vectorize(fx)(df['A'], df['B'])
>>> df
    A   B  new_column
0  10  20         200
1  20  30         600
2  30  10         300

Upvotes: 198

roman
roman

Reputation: 117636

You can go with @greenAfrican example, if it's possible for you to rewrite your function. But if you don't want to rewrite your function, you can wrap it into anonymous function inside apply, like this:

>>> def fxy(x, y):
...     return x * y

>>> df['newcolumn'] = df.apply(lambda x: fxy(x['A'], x['B']), axis=1)
>>> df
    A   B  newcolumn
0  10  20        200
1  20  30        600
2  30  10        300

Upvotes: 429

greenafrican
greenafrican

Reputation: 2546

This solves the problem:

df['newcolumn'] = df.A * df.B

You could also do:

def fab(row):
  return row['A'] * row['B']

df['newcolumn'] = df.apply(fab, axis=1)

Upvotes: 60

Related Questions