Reputation: 327
I would like to identify dates in a dataframe which are consecutive, that is there exists either an immediate predecessor or successor. I would then like to mark which dates are and are not consecutive in a new column. Additionally I would like to do this operation within particular subsets of my data.
First I create a new variable where I'd identify True of False for Consecutive Days.
weatherFile['CONSECUTIVE_DAY'] = 'NA'
I've converted dates into datetime objects then to ordinal ones:
weatherFile['DATE_OBJ'] = [datetime.strptime(d, '%Y%m%d') for d in weatherFile['DATE']]
weatherFile['DATE_INT'] = list([d.toordinal() for d in weatherFile['DATE_OBJ']])
Now I would like to identify consecutive dates in the following groups:
weatherFile.groupby(['COUNTY_GEOID_YEAR', 'TEMPBIN'])
I am thinking to loop through the groups and applying an operation that will identify which days are consecutive and which are not, within unique county, tempbin subsets.
I'm rather new to programming and python, is this a good approach so far, if so how can I progress?
Thank you - Let me know if I should provide additional information.
Update:
Using @karakfa advice I tried the following:
weatherFile.groupby(['COUNTY_GEOID_YEAR', 'TEMPBIN'])
weatherFile['DISTANCE'] = weatherFile[1:, 'DATE_INT'] - weatherFile[:-1,'DATE_INT']
weatherFile['CONSECUTIVE?'] = np.logical_or(np.insert((weatherFile['DISTANCE']),0,0) == 1, np.append((weatherFile['DISTANCE']),0) == 1)
This resulting in a TypeError: unhashable type. Traceback happened in the second line. weatherFile['DATE_INT'] is dtype: int64.
Upvotes: 4
Views: 1500
Reputation: 1329
You can use .shift(-1) or .shift(1) to compare consecutive entries:
df.loc[df['DATE_INT'].shift(-1) - df['DATE_INT'] == 1, 'CONSECUTIVE_DAY'] = True
Will set CONSECUTIVE_DAY to TRUE if the previous entry is the previous day
df.loc[(df['DATE_INT'].shift(-1) - df['DATE_INT'] == 1) | (df['DATE_INT'].shift(1) - df['DATE_INT'] == -1), 'CONSECUTIVE_DAY'] = True
Will set CONSECUTIVE_DAY to TRUE if the entry is preceeded by or followed by a consecutive date.
Upvotes: 4
Reputation: 67467
Once you have the ordinal numbers it's a trivial task, here I'm using numpy arrays to propose one alternative
a=np.array([1,2,4,6,7,10,12,13,14,20])
d=a[1:]-a[:-1] # compute delta
ind=np.logical_or(np.insert(d,0,0)==1,np.append(d,0)==1) # at least one side matches
a[ind] # get matching entries
gives you the numbers where there is a consecutive number
array([ 1, 2, 6, 7, 12, 13, 14])
namely 4, 10 and 20 are removed.
Upvotes: 2