Reputation: 101
i have a pandas data frame df.drop which has two date columns Joined Date and Terminated Date. i want to get the difference(in days) between terminated date and joined date. but there is a text value in terminated date 'Not terminated'. hence i tried to convert it in to today's date and then get the difference. below is the code which i tried for convertion
import time
today=time.strftime(("%Y-%m-%d"))
df_drop['TerminatedDate_new'] = [today if x=='Not_Terminated' else df_drop['TerminatedDate'] for x in df_drop['TerminatedDate']]
although it gives correct answer(today) for 'Not_Terminated' rows, for date values, it gives entire df_drop['TerminatedDate'] column instead of the existing date(else part of the code).
how do i change it so that it select the same raw and give the existing date value?
also is there easy method to get the difference without separably calculating df_drop['TerminatedDate_new']
Upvotes: 0
Views: 253
Reputation: 144
Your code is a bit long. A better way to do it is
df['TerminatedDate'].replace({'Not_Terminated':today}, inplace=True)
If you don't want replace the old column, you could save it to new column.
df['new_col'] = df['TerminatedDate'].replace({'Not_Terminated':today})
The problem with your code is this part else df_drop['TerminatedDate'] for x
as it replaces the a cell by the entire column. It should be else x for x
.
If you want to get the difference in one single action, you would have to create a custom function and apply it row wise.
def get_dif(start,end):
if end == "Not_Terminated":
end = today
return end-start
df['new_col'] = df.apply(lambda df: get_dif(df['JoinedDate'],df['TerminatedDate'], axis=1)
Upvotes: 2