C. Marlet
C. Marlet

Reputation: 11

Pythons apply a function on dataframe by row

I don't understand how "row" arugment should be used when creating a function, when the function has other arguments. I want to create a function which calculate a new column to my dataframe "file".

This works great :

def imputation(row):    
    if  (row['hour_y']==0) & (row['outlier_idx']==True) :
        val=file['HYDRO'].mean()
    else : 
        val=row['HYDRO']
    return val

file['minute_corr'] = file.apply(imputation, axis=1) 

But this does not work (I added an argument) :

def imputation(row,variable):    
    if  (row['hour_y']==0) & (row['outlier_idx']==True) :
        val=file[variable].mean()
    else : 
        val=row[variable]
    return val
file['minute_corr'] = file.apply(imputation(,'HYDRO'), axis=1) 

Upvotes: 1

Views: 50

Answers (3)

JayBee
JayBee

Reputation: 143

The apply method can take positional and keyword arguments:

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.apply.html

For the last line try: Try:

file['minute_corr'] = file.apply(imputation,args=('HYDRO',), axis=1)

Upvotes: 0

YOLO
YOLO

Reputation: 21709

Using apply function you can also parallelize the computation.

file['minute_corr'] = file.apply(lambda row: (file['HYDRO'].mean() if (row['hour_y']==0) & (row['outlier_idx']==True) else row['HYDRO'] ), axis=1)

Upvotes: 0

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210842

Try this vectorized approach:

file['minute_corr'] = np.where((file['hour_y']==0) & file['outlier_idx'],
                               file['HYDRO'].mean(),
                               file['HYDRO'])

Upvotes: 1

Related Questions