Vishal
Vishal

Reputation: 20617

How do I find missing dates in a list of sorted dates?

In Python how do I find all the missing days in a sorted list of dates?

Upvotes: 23

Views: 20443

Answers (10)

Henul
Henul

Reputation: 232

Assuming you have imported the pandas library:

incomplete_dates = pd.DataFrame({'dates': ['2021-01-10', '2021-01-14', '2021-01-15','2021-01-20']})

incomplete_dates = pd.to_datetime(incomplete_dates['dates'])

complete_dates = pd.date_range(start="2021-01-10", end="2021-01-20", freq='D')
 
missing_dates = complete_dates.difference(incomplete_dates)

print (missing_dates)

>>> DatetimeIndex(['2021-01-11', '2021-01-12', '2021-01-13', '2021-01-16',
               '2021-01-17', '2021-01-18', '2021-01-19'],
              dtype='datetime64[ns]', freq=None)
  1. original dataset with missing dates

  2. convert string dates into datetime format

  3. complete range of time period

  4. function for finding missing dates

learn more from: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.difference.html

Upvotes: 0

Brndn
Brndn

Reputation: 846

Here's an example for a pandas dataframe with a date column. If it's an index, then change df.Date to df.index.

import pandas as pd

df.Date = pd.to_datetime(df.Date) # ensure datetime format of date column
min_dt = df.Date.min() # get lowest date
max_dt = df.Date.max() # get highest date

dt_range = pd.date_range(min_dt, max_dt) # get all requisite dates in range
missing_dts = [d for d in dt_range if d not in df.Date] # list missing
print("There are {n} missing dates".format(n=len(missing_dts)))

Upvotes: 1

Nelis
Nelis

Reputation: 91

USING A FOR LOOP

The imports you'll need:

import datetime
from datetime import date, timedelta

Let's say you have a sorted list called dates with several missing dates in it.

First select the first and last date:

start_date = dates[0]
end_date = dates[len(dates)-1]

Than count the number of days between these two dates:

numdays = (end_date - start_date).days

Than create a new list with all dates between start_date and end_date:

all_dates = []
for x in range (0, (numdays+1)):
all_dates.append(start_date + datetime.timedelta(days = x))

Than check with dates are in all_dates but not in dates by using a for loop with range and adding these dates to dates_missing:

dates_missing = []

for i in range (0, len(all_dates)):
   if (all_dates[i] not in dates):
       dates_missing.append(all_dates[i])
   else:
       pass

Now you'll have a list called dates_missing with all the missing dates.

Upvotes: 2

Bishwas Mishra
Bishwas Mishra

Reputation: 1342

A good way of getting this done in Python is as follows. You need not worry about efficiency unless you have dates from multiple years in your list and this code always needs to run as per user interaction and yield output immediately.

  1. Get missing dates from one list (sorted or not)

Create a function that gives you all the dates from start_date to end_date. And use it.

import datetime

def get_dates(start_date, end_date):
    span_between_dates = range(end_date - start_date).days
    for index in span_between_dates + 1:
        # +1 is to make start and end dates inclusive.
        yield start_date + datetime.timedelta(index)

my_date_list = ['2017-03-05', '2017-03_07', ...]
# Edit my_date_list as per your requirement.
start_date = min(my_date_list)
end_date = max(my_date_list)
for current_date in get_dates(start_date, end_date)
    if date not in my_date_list:
        print date
  1. Get missing or overlapping dates between two date ranges.

get_dates function should be defined.

my_other_date_list = []  # your other date range
start_date = min(my_date_list)
end_date = max(my_date_list)
for current_date in get_dates(start_date, end_date)
    if (date in my_date_range) and (date in my_other_date_list):
        print ('overlapping dates between 2 lists:')
        print date
    elif (date in my_date_range) and (date not in my_other_date_list):
        print ('missing dates:')
        print date

Upvotes: 0

John La Rooy
John La Rooy

Reputation: 304205

using sets

>>> from datetime import date, timedelta
>>> d = [date(2010, 2, 23), date(2010, 2, 24), date(2010, 2, 25),
         date(2010, 2, 26), date(2010, 3, 1), date(2010, 3, 2)]
>>> date_set = set(d[0] + timedelta(x) for x in range((d[-1] - d[0]).days))
>>> missing = sorted(date_set - set(d))
>>> missing
[datetime.date(2010, 2, 27), datetime.date(2010, 2, 28)]
>>> 

Upvotes: 32

John La Rooy
John La Rooy

Reputation: 304205

Using a list comprehension

>>> from datetime import date, timedelta
>>> d = [date(2010, 2, 23),date(2010, 2, 24),date(2010, 2, 25),date(2010, 2, 26),date(2010, 3, 1),date(2010, 3, 2)]
>>> date_set=set(d)
>>> missing = [x for x in (d[0]+timedelta(x) for x in range((d[-1]-d[0]).days)) if x not in date_set]

>>> missing
[datetime.date(2010, 2, 27), datetime.date(2010, 2, 28)]

Upvotes: 0

John La Rooy
John La Rooy

Reputation: 304205

>>> from datetime import datetime, timedelta
>>> date_list = [datetime(2010, 2, 23),datetime(2010, 2, 24),datetime(2010, 2, 25),datetime(2010, 2, 26),datetime(2010, 3, 1),datetime(2010, 3, 2)]
>>> 
>>> date_set=set(date_list)         # for faster membership tests than list
>>> one_day = timedelta(days=1)
>>> 
>>> test_date = date_list[0]
>>> missing_dates=[]
>>> while test_date < date_list[-1]:
...     if test_date not in date_set:
...         missing_dates.append(test_date)
...     test_date += one_day
... 
>>> print missing_dates
[datetime.datetime(2010, 2, 27, 0, 0), datetime.datetime(2010, 2, 28, 0, 0)]

This also works for datetime.date objects, but the OP says the list is datetime.datetime objects

Upvotes: 2

jfs
jfs

Reputation: 414295

import datetime

DAY = datetime.timedelta(days=1)
# missing dates: a list of [start_date, end)
missing = [(d1+DAY, d2) for d1, d2 in zip(dates, dates[1:]) if (d2 - d1) > DAY]

def date_range(start_date, end, step=DAY):
    d = start_date
    while d < end:
        yield d
        d += step

missing_dates = [d for d1, d2 in missing for d in date_range(d1, d2)]

Upvotes: 0

Mark Byers
Mark Byers

Reputation: 838296

Sort the list of dates and iterate over it, remembering the previous entry. If the difference between the previous and current entry is more than one day, you have missing days.

Here's one way to implement it:

from datetime import date, timedelta
from itertools import tee, izip

def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    b.next()
    return izip(a, b)

def missing_dates(dates):
    for prev, curr in pairwise(sorted(dates)):
        i = prev
        while i + timedelta(1) < curr:
            i += timedelta(1)
            yield i

dates = [ date(2010, 1, 8),
          date(2010, 1, 2),
          date(2010, 1, 5),
          date(2010, 1, 1),
          date(2010, 1, 7) ]

for missing in missing_dates(dates):
    print missing

Output:

2010-01-03
2010-01-04
2010-01-06

Performance is O(n*log(n)) where n is the number of days in the span when the input is unsorted. As your list is already sorted, it will run in O(n).

Upvotes: 4

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 798706

Put the dates in a set and then iterate from the first date to the last using datetime.timedelta(), checking for containment in the set each time.

Upvotes: 1

Related Questions