yoshiserry
yoshiserry

Reputation: 21355

write a pandas/python function which requires as inputs multiple columns in a pandas dataframe

In a dataframe with two columns I can easily create a third without a function if it is a numerical operation such as multiply df["new"] =df["one"] * df["two"].

However what if I need to pass in more than two parameters to a function and those parameters are columns from a dataframe.

Passing one column at a time is simple using: df.apply(my_func) but if the functions definition is, and requires three columns:

def WordLength(col1,col2,col3):
return max(len(col1),len(col2),len(col3))

For example, A function WordLength would return the maximum length of the word from any of the three columns passed into it.

I know for example this doesn't work but I imagine something like this to return a result of a function requiring three parameters into a dataframe column:

df["word_length"]= df.apply(WordLength, [[param1,param2,param3]])

Update Jon, when trying to use your method of passing in three parameters (values from three dataframe columns for a given row I am getting the following error:

def get(name,start_date,end_date):
    try:
        df = ...

response = df.apply(get, axis=1, args=('name', 'date', 'today')) 

Error relating to arguments - I don't understand why it mentions 4 arguments when I have passed in three and the function only requires three arguments...

Error:

TypeError: ('getprice() takes exactly 3 arguments (4 given)', u'occurred at index 0')

Upvotes: 1

Views: 12504

Answers (2)

Jon Clements
Jon Clements

Reputation: 142206

Unless you really want a function to do this, you can use DataFrame operations, eg:

df[['col1', 'col2', 'col3']].applymap(len).max(axis=1)

You can use apply's args argument to pass in the columns to be processed and make the target function take a variable number of arguments for unpacking, eg:

def max_word_length(row, *cols):
    return row[list(cols)].map(len).max()

# Make sure `axis=1` so rows are passed in and we can access columns
df.apply(max_word_length, axis=1, args=('col1', 'col2', 'col3'))

Upvotes: 1

ysearka
ysearka

Reputation: 3855

I think you need a lambda function in your apply:

def WordLength(words):
    return max(len(words[0]),len(words[1]),len(words[2]))

df['wordlength'] = df[['col1','col2','col3']].apply(lambda x: WordLength(x),axis=1)

Output:

    col1            col2        col3                wordlength
0   word1           word10      wordover9000        12
1   anotherword     wooooord    test                11
2   yetanotherword  letter      Ihavenootheridea    16

Upvotes: 3

Related Questions