Reputation: 2871
I have a df as shown below
Date t_factor
2020-02-01 5
2020-02-06 14
2020-02-09 23
2020-02-03 23
2020-03-11 38
2020-02-20 29
2020-02-13 30
2020-02-29 100
2020-03-26 70
from the I would like to create a column called time_in_days, which will be calculated from the first day in the date column as shown below.
Note: where the column t_factor is unused here
Expected Output:
Date t_factor time_in_days
2020-02-01 5 1
2020-02-06 14 6
2020-02-09 23 9
2020-02-03 23 3
2020-03-11 38 40
2020-02-20 29 20
2020-02-13 30 13
2020-02-29 100 29
2020-03-26 70 55
Upvotes: 1
Views: 37
Reputation: 4152
Try changing the column to Date Time format first. and try to use something like this:
import pandas as pd
lis = '''2020-02-01
2020-02-06
2020-02-09
2020-02-03
2020-02-11
2020-02-20
2020-02-13
2020-02-29
2020-02-26'''.replace(' ','') .split() # ignore this
dt = pd.to_datetime(lis)
diff = dt[6]-dt[0]
print(diff.days)
Should do the trick.
df = pd.DataFrame({'date':dt,'random_col':np.random.randn(len(dt))})
df['date_diff'] = df['date'].apply(lambda x: x-df.iloc[0,0])
df
Upvotes: 0
Reputation: 402383
Subtract the dates from the first date to get the delta.
# If you have a column of strings,
# df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df['time_in_days_actual'] = (df['Date'] - df.at[0, 'Date']).dt.days + 1
df
Date t_factor time_in_days time_in_days_actual
0 2020-02-01 5 1 1
1 2020-02-06 14 6 6
2 2020-02-09 23 9 9
3 2020-02-03 23 3 3
4 2020-03-11 38 40 40
5 2020-02-20 29 20 20
6 2020-02-13 30 13 13
7 2020-02-29 100 29 29
8 2020-03-26 70 55 55
Upvotes: 1
Reputation: 17368
In [26]: a = ["2020-02-01", "2020-02-03", "2020-02-13", "2020-02-29","2020-03-26"]
In [27]: df = pd.DataFrame(a, columns=["Date"])
In [28]: start_date = datetime.strptime(df.iloc[0]["Date"],"%Y-%m-%d")
In [29]: df["time_in_days"] = df["Date"].apply(lambda x: (datetime.strptime(x,"%Y-%m-%d") - start_date).days+1)
In [30]: df
Out[30]:
Date time_in_days
0 2020-02-01 1
1 2020-02-03 3
2 2020-02-13 13
3 2020-02-29 29
4 2020-03-26 55
Upvotes: 1