IRK
IRK

Reputation: 95

find days between 2 dates in python but only number

I was trying to find difference of a series of dates and a date. for example, the series is from may1 to june1 which is

date = pd.DataFrame()

In [0]: date['test'] = pd.date_range("2021-05-01", "2021-06-01", freq = "D")

Out[0]: date
    test
0   2021-05-01 00:00:00
1   2021-05-02 00:00:00
2   2021-05-03 00:00:00
3   2021-05-04 00:00:00
4   2021-05-05 00:00:00
5   2021-05-06 00:00:00
6   2021-05-07 00:00:00
7   2021-05-08 00:00:00
8   2021-05-09 00:00:00
9   2021-05-10 00:00:00

In[1]
date['test'] = date['test'].dt.date

Out[1]:
    test
0   2021-05-01
1   2021-05-02
2   2021-05-03
3   2021-05-04
4   2021-05-05
5   2021-05-06
6   2021-05-07
7   2021-05-08
8   2021-05-09
9   2021-05-10

In[2]:date['base'] = dt.strptime("2021-05-01",'%Y-%m-%d')

Out[2]:
0   2021-05-01 00:00:00
1   2021-05-01 00:00:00
2   2021-05-01 00:00:00
3   2021-05-01 00:00:00
4   2021-05-01 00:00:00
5   2021-05-01 00:00:00
6   2021-05-01 00:00:00
7   2021-05-01 00:00:00
8   2021-05-01 00:00:00
9   2021-05-01 00:00:00

In[3]:date['base'] = date['base'].dt.date

Out[3]:
    base
0   2021-05-01
1   2021-05-01
2   2021-05-01
3   2021-05-01
4   2021-05-01
5   2021-05-01
6   2021-05-01
7   2021-05-01
8   2021-05-01
9   2021-05-01

In[4]:date['test']-date['base']

Out[4]: 
    diff
0   0 days 00:00:00.000000000
1   1 days 00:00:00.000000000
2   2 days 00:00:00.000000000
3   3 days 00:00:00.000000000
4   4 days 00:00:00.000000000
5   5 days 00:00:00.000000000
6   6 days 00:00:00.000000000
7   7 days 00:00:00.000000000
8   8 days 00:00:00.000000000
9   9 days 00:00:00.000000000
10  10 days 00:00:00.000000000

the only thing i could get is this. I don't want anything other than the number 1-10 cuz i need them for further numerical calculation but i can't get rid of those. Also how could i construct a time series which just outputs the date not the hms after it? i don't want to manually .dt.date for all of those and it sometimes mess things up

Upvotes: 1

Views: 130

Answers (3)

ti7
ti7

Reputation: 18866

Alternatively, with a naive day-based series, you can use the index as the day offset (as that's how the DataFrame was generated)!

>>> import pandas as pd
>>> df = pd.DataFrame({"date": pd.date_range("2021-05-01", "2021-06-01", freq = "D")})
>>> df["days"] = df.index
>>> df
         date  days
0  2021-05-01     0
1  2021-05-02     1
2  2021-05-03     2
3  2021-05-04     3
...
31 2021-06-01    31

Upvotes: 0

ti7
ti7

Reputation: 18866

You can convert the timestamps first to epoch seconds (they are actually stored internally as some number, and likely a factor of epoch seconds)

Using pandas datetime to unix timestamp seconds

import pandas as pd
# start df with date column
df = pd.DataFrame({"date": pd.date_range("2021-05-01", "2021-06-01", freq = "D")})
# create a column for datetimes
df["ts"] = (df["date"] - pd.Timestamp("1970-01-01")) // pd.Timedelta("1s")
>>> df
         date          ts
0  2021-05-01  1619827200
1  2021-05-02  1619913600
2  2021-05-03  1620000000
3  2021-05-04  1620086400
...
31 2021-06-01  1622505600

This will allow you to do integer math before converting back

>>> df["days"] = (df["ts"] - min(df["ts"])) // (60*60*24)  # 1 day in seconds
>>> df
         date          ts  days
0  2021-05-01  1619827200     0
1  2021-05-02  1619913600     1
2  2021-05-03  1620000000     2
3  2021-05-04  1620086400     3
...
31 2021-06-01  1622505600    31

Upvotes: 0

Sayandip Dutta
Sayandip Dutta

Reputation: 15872

You don't need to create a column base for this, simply do:

>>> (date['test'] - pd.to_datetime("2021-05-01", format='%Y-%m-%d')).dt.days
0      0
1      1
2      2
3      3
4      4
...
27    27
28    28
29    29
30    30
31    31
Name: test, dtype: int64

Upvotes: 1

Related Questions