Danish
Danish

Reputation: 2871

calculate time in days based on another date column and its first date in pandas

I have a df as shown below

Date                t_factor  
2020-02-01             5
2020-02-06             14    
2020-02-09             23    
2020-02-03             23             
2020-03-11             38            
2020-02-20             29                 
2020-02-13             30              
2020-02-29             100              
2020-03-26             70    

from the I would like to create a column called time_in_days, which will be calculated from the first day in the date column as shown below.

Note: where the column t_factor is unused here

Expected Output:

Date                t_factor     time_in_days
2020-02-01             5             1
2020-02-06             14            6
2020-02-09             23            9
2020-02-03             23            3       
2020-03-11             38            40         
2020-02-20             29            20               
2020-02-13             30            13           
2020-02-29             100           29           
2020-03-26             70            55   

Upvotes: 1

Views: 37

Answers (3)

Deshwal
Deshwal

Reputation: 4152

Try changing the column to Date Time format first. and try to use something like this:

import pandas as pd

lis = '''2020-02-01  
2020-02-06        
2020-02-09        
2020-02-03          
2020-02-11                
2020-02-20                         
2020-02-13                      
2020-02-29                 
2020-02-26'''.replace(' ','')  .split()   # ignore this  


dt = pd.to_datetime(lis)
diff = dt[6]-dt[0]
print(diff.days)

Should do the trick.

df = pd.DataFrame({'date':dt,'random_col':np.random.randn(len(dt))})
df['date_diff'] = df['date'].apply(lambda x: x-df.iloc[0,0])
df

Upvotes: 0

cs95
cs95

Reputation: 402383

Subtract the dates from the first date to get the delta.

# If you have a column of strings,
# df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df['time_in_days_actual'] = (df['Date'] - df.at[0, 'Date']).dt.days + 1
df
        Date  t_factor  time_in_days  time_in_days_actual
0 2020-02-01         5             1                    1
1 2020-02-06        14             6                    6
2 2020-02-09        23             9                    9
3 2020-02-03        23             3                    3
4 2020-03-11        38            40                   40
5 2020-02-20        29            20                   20
6 2020-02-13        30            13                   13
7 2020-02-29       100            29                   29
8 2020-03-26        70            55                   55

Upvotes: 1

bigbounty
bigbounty

Reputation: 17368

In [26]: a = ["2020-02-01", "2020-02-03", "2020-02-13", "2020-02-29","2020-03-26"]

In [27]: df = pd.DataFrame(a, columns=["Date"])

In [28]: start_date = datetime.strptime(df.iloc[0]["Date"],"%Y-%m-%d")

In [29]: df["time_in_days"] = df["Date"].apply(lambda x: (datetime.strptime(x,"%Y-%m-%d") - start_date).days+1)

In [30]: df
Out[30]:
         Date  time_in_days
0  2020-02-01             1
1  2020-02-03             3
2  2020-02-13            13
3  2020-02-29            29
4  2020-03-26            55

Upvotes: 1

Related Questions