Reputation: 4029
Right now I am iterating over a pandas dataframe to patch data inconsistencies with datetime columns, but the runtime is brutal.
def fix(row):
return row.datetime_column2 if row.datetime_column1 > row.datetime_column2 else row.datetime_column1
df['datetime_column1'] = df.apply(fix, axis = 1)
Is there a smarter way to do this?
Upvotes: 2
Views: 35
Reputation: 1609
If possible with pandas/numpy, don't use for loops as you do with (axis=1). Try the following. Should work for dates too.
df=pd.DataFrame(data={'column_1':[1,3,5,5],'column_2':[0,4,1,6] })
df.loc[df.column_1>df.column_2, 'column_1'] = df.column_2
Upvotes: 0
Reputation: 210832
IIUC you can use this vectorized approach:
df['datetime_column1'] = \
np.where(df['datetime_column1'] > df['datetime_column2'],
df['datetime_column2'],
df['datetime_column1'])
or:
df['datetime_column1'] = df[['datetime_column1','datetime_column2']].min(1)
Upvotes: 1