Justin
Justin

Reputation: 327

How to identify consecutive dates

I would like to identify dates in a dataframe which are consecutive, that is there exists either an immediate predecessor or successor. I would then like to mark which dates are and are not consecutive in a new column. Additionally I would like to do this operation within particular subsets of my data.

First I create a new variable where I'd identify True of False for Consecutive Days.

weatherFile['CONSECUTIVE_DAY'] = 'NA'   

I've converted dates into datetime objects then to ordinal ones:

weatherFile['DATE_OBJ'] = [datetime.strptime(d, '%Y%m%d') for d in weatherFile['DATE']]
weatherFile['DATE_INT'] = list([d.toordinal() for d in weatherFile['DATE_OBJ']])

Now I would like to identify consecutive dates in the following groups:

weatherFile.groupby(['COUNTY_GEOID_YEAR', 'TEMPBIN'])

I am thinking to loop through the groups and applying an operation that will identify which days are consecutive and which are not, within unique county, tempbin subsets.

I'm rather new to programming and python, is this a good approach so far, if so how can I progress?

Thank you - Let me know if I should provide additional information.

Update:

Using @karakfa advice I tried the following:

weatherFile.groupby(['COUNTY_GEOID_YEAR', 'TEMPBIN'])
weatherFile['DISTANCE'] = weatherFile[1:, 'DATE_INT'] - weatherFile[:-1,'DATE_INT']
weatherFile['CONSECUTIVE?'] = np.logical_or(np.insert((weatherFile['DISTANCE']),0,0) == 1, np.append((weatherFile['DISTANCE']),0) == 1)

This resulting in a TypeError: unhashable type. Traceback happened in the second line. weatherFile['DATE_INT'] is dtype: int64.

Upvotes: 4

Views: 1500

Answers (2)

SO44
SO44

Reputation: 1329

You can use .shift(-1) or .shift(1) to compare consecutive entries:

df.loc[df['DATE_INT'].shift(-1) - df['DATE_INT'] == 1, 'CONSECUTIVE_DAY'] = True

Will set CONSECUTIVE_DAY to TRUE if the previous entry is the previous day

df.loc[(df['DATE_INT'].shift(-1) - df['DATE_INT'] == 1) | (df['DATE_INT'].shift(1) - df['DATE_INT'] == -1), 'CONSECUTIVE_DAY'] = True

Will set CONSECUTIVE_DAY to TRUE if the entry is preceeded by or followed by a consecutive date.

Upvotes: 4

karakfa
karakfa

Reputation: 67467

Once you have the ordinal numbers it's a trivial task, here I'm using numpy arrays to propose one alternative

a=np.array([1,2,4,6,7,10,12,13,14,20])
d=a[1:]-a[:-1]                                            # compute delta
ind=np.logical_or(np.insert(d,0,0)==1,np.append(d,0)==1)  # at least one side matches
a[ind]                                                    # get matching entries

gives you the numbers where there is a consecutive number

array([ 1,  2,  6,  7, 12, 13, 14])

namely 4, 10 and 20 are removed.

Upvotes: 2

Related Questions