Groupby and apply different function to each column with first and last

Question

I am trying to group the columns then apply different functions to each column. I referred to the answer here and my code is as shown below

def f(x):
    d = {}
    d['a'] = x['a'].max()
    d['b'] = x['b'].first()
    d['c'] = x['c'].last()
    return pd.Series(d, index=['a', 'b', 'c'])

require_data = required_data.groupby(['S','id', 'lane', 'timestamp','E']).apply(f)

And I am getting the following error because of first function

TypeError: first() missing 1 required positional argument: 'offset'

But I can run groupby with first fine

require_data = required_data.groupby(['S','id', 'lane', 'timestamp','E']).first()

What is the cause of the error

jezrael · Accepted Answer

Better here is use GroupBy.agg, there is possible pass columns names with aggregate methods GroupBy.first and GroupBy.last:

require_data = (required_data.groupby(['S','id', 'lane', 'timestamp','E'])
                             .agg({'a':'max', 'b':'first', 'c':'last'}))

If you want to use your own custom function, it's necessary to select by position, with Series.iat or with Series.iloc, but like @Erfan mentioned, thank you:

Using your own custom function is highly discouraged, because of efficiency.

def f(x):
    d = {}
    d['a'] = x['a'].max()
    d['b'] = x['b'].iat[0]
    d['c'] = x['c'].iat[-1]
    return pd.Series(d, index=['a', 'b', 'c'])

Groupby and apply different function to each column with first and last

Answers (1)

Related Questions