Reputation: 884
I am working on a big pandas dataframe which has a time column ( sorted) and what I want is to drop any dates which are duplicates and within a delta of 1 day (say). Apart from that what I am also looking for is to create another column as label (based on the time) . So to give a simple example ( for simplicity I just kept the time column here:
df:
Time Column
2020-12-05
2020-12-09
2020-12-13
2020-12-18
2020-12-21
2020-12-21
2020-12-23
2020-12-24
2020-12-31
2021-01-04
So what I am looking for in an output dataframe which looks like this ( after dropping duplicates and dates within a day and keeping the first occurence only - in the example I had 2020-12-23 & 2020-12-24 and kept date 23 only) and creating a label Column:
df:
Time Column Label
2020-12-05 L1
2020-12-09 L2
2020-12-13 L3
2020-12-18 L4
2020-12-21 L5
2020-12-23 L6
2020-12-31 L7
2021-01-04 L8
Not getting any idea how to achieve this. Any help in this direction will be immensely appreciated.
Upvotes: 1
Views: 42
Reputation: 884
I did something like this:
df = pd.DataFrame(dt)
df['time_column'] = pd.to_datetime(df['time_column'])
mdf = df.copy()
mdf = mdf.rename(columns={'time_column': 'date_column'})
myf = df.diff(periods=1)
finDF = myf.join(mdf,how='inner')
mask = ['0 days', '1 days']
finDF = finDF[~finDF['time_column'].isin(mask)]
del finDF['time_column']
finDF=finDF.assign(Label=
'L'+finDF['date_column'].notna().cumsum().astype(str))
finDF
Upvotes: 0
Reputation: 26676
df['Time_Column']=pd.to_datetime(df['Time_Column'])#Coerce Time_Column to date
df=df[df['Time_Column'].diff(1).dt.days.ne(1)].drop_duplicates()#Remove consecutive dates and duplicates
df=df.assign(Label= 'L'+df['Time_Column'].notna().cumsum().astype(str))#Cumsum the Time_Column
print(df)
Time_Column Label
0 2020-12-05 L1
1 2020-12-09 L2
2 2020-12-13 L3
3 2020-12-18 L4
4 2020-12-21 L5
6 2020-12-23 L6
8 2020-12-31 L7
9 2021-01-04 L8
Upvotes: 1
Reputation: 346
(df.assign(dt_day=lambda x: x['time column'].dt.date)
.drop_duplicates('dt_day', keep='first')
.drop(columns=['dt_day']))
Upvotes: 0