Reputation: 11
I don't understand how "row" arugment should be used when creating a function, when the function has other arguments. I want to create a function which calculate a new column to my dataframe "file".
This works great :
def imputation(row):
if (row['hour_y']==0) & (row['outlier_idx']==True) :
val=file['HYDRO'].mean()
else :
val=row['HYDRO']
return val
file['minute_corr'] = file.apply(imputation, axis=1)
But this does not work (I added an argument) :
def imputation(row,variable):
if (row['hour_y']==0) & (row['outlier_idx']==True) :
val=file[variable].mean()
else :
val=row[variable]
return val
file['minute_corr'] = file.apply(imputation(,'HYDRO'), axis=1)
Upvotes: 1
Views: 50
Reputation: 143
The apply method can take positional and keyword arguments:
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.apply.html
For the last line try: Try:
file['minute_corr'] = file.apply(imputation,args=('HYDRO',), axis=1)
Upvotes: 0
Reputation: 21709
Using apply
function you can also parallelize the computation.
file['minute_corr'] = file.apply(lambda row: (file['HYDRO'].mean() if (row['hour_y']==0) & (row['outlier_idx']==True) else row['HYDRO'] ), axis=1)
Upvotes: 0
Reputation: 210842
Try this vectorized approach:
file['minute_corr'] = np.where((file['hour_y']==0) & file['outlier_idx'],
file['HYDRO'].mean(),
file['HYDRO'])
Upvotes: 1