Reputation: 99
I am using an AirBnb dataset. I have a column, 'host_since'. The column contains date objects in the format of 'DD/MM/YYYY': for example, 24/09/2008. The columns' data shows the date that an individual became a host.
I want to create a new column in my dataframe that contains the the number of days since the host first joined. I am aware of the to_datetime
method but cant quite understand the documentation.
note: df['host_days'] does not exist in the df. I want to create a column then assign the appropriate steps to get my outcome
def add_host_days(df):
df['host_days'] = df['host_since'].to_datetime()
return df
Any ideas on how to do so? Thank you for your input.
Upvotes: 1
Views: 1957
Reputation: 23217
You can try this:
def add_host_days(df):
df['host_days'] = (pd.Timestamp.now() - pd.to_datetime(df['host_since'], dayfirst=True)).dt.days
# If you original date fields have invalid dates and would like this number of days to be in integer:
df['host_days'] = df['host_days'].astype('Int64')
return df
Suppose you have a dataframe like this:
guest_id host_since
0 A0001 24/09/2008
1 A0002 25/09/2008
2 A0003 29/09/2008
3 A8788 20/05/2021
Then you run the code:
new_df = add_host_days(df)
Result:
print(new_df)
guest_id host_since host_days
0 A0001 24/09/2008 4629
1 A0002 25/09/2008 4628
2 A0003 29/09/2008 4624
3 A8788 20/05/2021 8
Upvotes: 2
Reputation: 345
Couldn't test it as you haven't given any reproducible code but you can subtract host_since from current date and capture the days like -
def add_host_days(df):
df['host_days'] = df.apply(lambda x: (datetime.datetime.today() - x.host_since).days, axis=1)
return df
Upvotes: 2