Federico Gentile
Federico Gentile

Reputation: 5940

Apply function between subsequent rows in more efficient way with Pandas

I have a dataframe df that is defined like so:

import numpy as np
import pandas as pd
dic = {'A':['1A','1A','3C','3C','3C','7M','7M','7M'],'B':[10,15,49,75,35,33,45,65],'C':[11,56,32,78,45,89,15,14],'D':[111,0,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],'E':[0,222,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]}

df = pd.DataFrame(dic)

My goal is to perform some calculations between rows having the same item in column A.

The function is defined like this (but could be anything):

def fun(a,b,c,d):
    out = a*c + b/2 + d*b
    return out

The result of such operation will be stored in column D and E according to the following rules:

# Fill column D
for j in range(0,len(df)-1):
    if df['A'].iloc[j]==df['A'].iloc[j+1] and pd.isnull(df['D'].iloc[j]):
        df['D'].iloc[j] = fun(df['B'].iloc[j],df['B'].iloc[j],df['B'].iloc[j+1],df['B'].iloc[j+1])  

# Fill column E       
for j in reversed(range(1,len(df))):
    if df['A'].iloc[j-1]==df['A'].iloc[j] and pd.isnull(df['E'].iloc[j]):
        df['E'].iloc[j] = fun(df['B'].iloc[j],df['B'].iloc[j],df['B'].iloc[j-1],df['B'].iloc[j-1])   

Both loops are very similar but the second one is looping from the last element to the first one of the dataframe. My code works correctly and the result should be like this:

          # Before #                         # After #
    A   B   C    D    E          A   B   C       D       E
0  1A  10  11  111    0      0  1A  10  11   111.0     0.0
1  1A  15  56    0  222      1  1A  15  56     0.0   222.0
2  3C  49  32  NaN  NaN      2  3C  49  32  7374.5     NaN
3  3C  75  78  NaN  NaN      3  3C  75  78  5287.5  7387.5
4  3C  35  45  NaN  NaN      4  3C  35  45     NaN  5267.5
5  7M  33  89  NaN  NaN      5  7M  33  89  2986.5     NaN
6  7M  45  15  NaN  NaN      6  7M  45  15  5872.5  2992.5
7  7M  65  14  NaN  NaN      7  7M  65  14     NaN  5882.5

Would you be able to improve such code in order to make it more efficient using some build in functions from the Pandas library? I would like to have some more elegant way to achieve my result.

Note: the first and second row have already a value (111 0 and 0 222) and therefore they must not be calculated by the function!

Upvotes: 4

Views: 2622

Answers (3)

Federico Gentile
Federico Gentile

Reputation: 5940

In order to solve my problem I defined another function which takes as input fun

def fun2(df,s):
    X= fun(df.B,df.C,df.B.shift(s),df.C.shift(s))
    return X

Columns D and E can be filled like so:

df2['D']=np.where((df2.A.shift(-1)==df2.A) & (df2.D.isnull()==True),fun2(df2,-1),df2.D) 
df2['E']=np.where((df2.A.shift(1)==df2.A) & (df2.E.shift(1).isnull()==True),fun2(df2,+1),df2.E)

Note: even though more compact, this approach may be slower

Upvotes: 0

Louis R
Louis R

Reputation: 1790

You could first group by the values of A, and then apply a vectorized function :

def fun(a,b,c,d):
    out = a*c + b/2 + d*b
    return out

def apply_func(df):
    mask = pd.isnull(df['D'][:-1])
    df['D'][:-1][mask] = fun(df['B'][:-1].values, df['B'][:-1].values, 
                             df['B'][1:].values, df['B'][1:].values)
    mask = pd.isnull(df['E'][1:])
    df['E'][1:][mask] = fun(df['B'][1:].values, df['B'][1:].values, 
                            df['B'][:-1].values, df['B'][:-1].values)
    return df

and then :

df = df.groupby('A').apply(apply_func).reset_index(drop=True)

    A   B   C       D       E
0  1A  10  11   305.0     NaN
1  1A  15  56     NaN   307.5
2  3C  49  32  7374.5     NaN
3  3C  75  78  5287.5  7387.5
4  3C  35  45     NaN  5267.5
5  7M  33  89  2986.5     NaN
6  7M  45  15  5872.5  2992.5
7  7M  65  14     NaN  5882.5

Upvotes: 2

Shijo
Shijo

Reputation: 9711

you could use np.where and dataframe.shift()

  • np.where works like an if statement
  • datafrmae.shift() - Shift index by desired number of periods with an optional time freq

     df['D']=np.where(df.A.shift(-1)==df.A,func(df['B'],df['B'],df.B.shift(-1),df.B.shift(-1)),np.NaN)
    

Upvotes: 1

Related Questions