Bryce Ramgovind
Bryce Ramgovind

Reputation: 3257

Check if dates are consecutive

I have a pandas dataframe with check-up dates as the index and a category and type of species.

Date    Category    Type
2016-11-21  Cat Animal
2016-11-21  Cat Animal
2017-01-03  Cat Animal
2016-11-01  Dog Animal
2017-01-03  Dog Animal
2016-10-03  Dog Animal
2017-02-01  Dog Animal
2017-03-01  Dog Animal
2016-12-01  Dog Animal
2016-11-21  Horse   Animal
2016-12-15  Horse   Animal
2017-02-04  Horse   Animal

I need to determine for each category whether the dates were consecutive . Consecutive could mean a month is missing, this is seen in the horse category. The month of January is missing for horse, but overall the horse's check-up updates are increasing. So horse is valid. Cat, however, is invalid since it received a check up twice in one day.

The final dataframe should be

Date    Category    Type
2016-11-01  Dog Animal
2017-01-03  Dog Animal
2016-10-03  Dog Animal
2017-02-01  Dog Animal
2017-03-01  Dog Animal
2016-12-01  Dog Animal
2016-11-21  Horse   Animal
2016-12-15  Horse   Animal
2017-02-04  Horse   Animal

Upvotes: 1

Views: 857

Answers (2)

jezrael
jezrael

Reputation: 862431

It seems you need is_unique and is_monotonic_increasing with filter:

df = df.groupby('Category').filter(lambda x: x.index.is_unique and 
                                             x.index.is_monotonic_increasing)
print (df)
           Category    Type
Date                       
2016-11-21    Horse  Animal
2016-12-15    Horse  Animal
2017-02-04    Horse  Animal

Upvotes: 2

FLab
FLab

Reputation: 7466

Here is how you can exclude animals for which the index is not unique:

df.groupby('Category').filter(lambda x: x.index.is_unique)

which gives the desired output:

           Category    Type
Date                       
2016-11-01      Dog  Animal
2017-01-03      Dog  Animal
2016-10-03      Dog  Animal
2017-02-01      Dog  Animal
2017-03-01      Dog  Animal
2016-12-01      Dog  Animal
2016-11-21    Horse  Animal
2016-12-15    Horse  Animal
2017-02-04    Horse  Animal

Your question would suggest you want the dates to be strictly increasing, then you can do:

def index_diff(x):
    return ((d_1 - d_0).days for d_1, d_0 in zip(x.index[1:], x.index[:-1]))

df.groupby('Category').filter(lambda x: min(index_diff(x))> 0)

Although this only returns

           Category    Type
Date                       
2016-11-21    Horse  Animal
2016-12-15    Horse  Animal
2017-02-04    Horse  Animal

since for Dog you have 2017-01-03 appearing before 2016-10-03

Upvotes: 0

Related Questions