Reputation: 11
here is my code and datetime columns.
import pandas as pd
xcel_file=pd.read_excel('data.xlsx',usecols=['datetime'])
date=[]
time=[]
date.append((xcel_file['datetime']).dt.date)
time.append((xcel_file['datetime']).dt.time)
new_file=pd.DataFrame({'a':len(xcel_file['datetime'])},index=xcel_file['datetime'])
day=new_file.between_time('9:00','16:00')
day.reset_index(inplace=True)
day=day.drop(columns={'a'})
day['time']=pd.to_datetime(day['datetime']).dt.date
model_list=day['time'].drop_duplicates()
data_set=[]
i=0
for n in day['datetime']:
data_2=max(day['datetime'][day['time']==model_list[i])
i+=1
data_set.append(data_2)
datetime column
0 2022-01-10 09:30:00
1 2022-01-10 10:30:00
2 2022-01-11 10:30:00
3 2022-01-11 15:30:00
4 2022-01-11 11:00:00
5 2022-01-11 12:00:00
6 2022-01-12 13:00:00
7 2022-01-12 15:30:00
8 2022-01-13 14:00:00
9 2022-01-14 15:00:00
10 2022-01-14 16:00:00
11 2022-01-14 16:30:00
expected result
1 2022-01-10 10:30:00
3 2022-01-11 15:30:00
7 2022-01-12 15:30:00
8 2022-01-13 14:00:00
9 2022-01-14 15:00:00
I'm trying to get max value of same dates from datetime column in between time 9am to 4pm. Is there any way of doing this? Truly thankful for any kind of help.
Upvotes: 1
Views: 1730
Reputation: 862641
Use DataFrame.between_time
with aggregate by days in Grouper
for maximal datetimes:
df = pd.read_excel('data.xlsx',usecols=['datetime'])
df = df.set_index('datetime', drop=False)
df = (df.between_time('9:00','16:00')
.groupby(pd.Grouper(freq='d'))[['datetime']]
.max()
.reset_index(drop=True))
print (df)
datetime
0 2022-01-10 10:30:00
1 2022-01-11 15:30:00
2 2022-01-12 15:30:00
3 2022-01-13 14:00:00
4 2022-01-14 16:00:00
EDIT: Added missing values if exist match, so DataFrame.dropna
solve this problem.
print (df)
datetime
0 2022-01-10 17:40:00
1 2022-01-10 19:30:00
2 2022-01-11 19:30:00
3 2022-01-11 15:30:00
4 2022-01-12 19:30:00
5 2022-01-12 15:30:00
6 2022-01-14 18:30:00
7 2022-01-14 16:30:00
df = df.set_index('datetime', drop=False)
df = (df.between_time('17:00','19:30')
.groupby(pd.Grouper(freq='d'))[['datetime']]
.max()
.dropna()
.reset_index(drop=True))
print (df)
datetime
0 2022-01-10 19:30:00
1 2022-01-11 19:30:00
2 2022-01-12 19:30:00
3 2022-01-14 18:30:00
Added alternative solution:
df = df.set_index('datetime', drop=False)
df = (df.between_time('17:00','19:30')
.sort_index()
.assign(d = lambda x: x['datetime'].dt.date)
.drop_duplicates('d', keep='last')
.drop('d', axis=1)
.reset_index(drop=True)
)
print (df)
datetime
0 2022-01-10 19:30:00
1 2022-01-11 19:30:00
2 2022-01-12 19:30:00
3 2022-01-14 18:30:00
EDIT: solution for filter first by datetime
column, then datetime2
and last filtering by date
s from datetime
column:
print (df)
datetime datetime2
0 2022-01-10 09:30:00 2022-01-10 17:40:00
1 2022-01-10 10:30:00 2022-01-10 19:30:00
2 2022-01-11 10:30:00 2022-01-11 19:30:00
3 2022-01-11 15:30:00 2022-01-11 15:30:00
4 2022-01-11 11:00:00 2022-01-12 15:30:00
5 2022-01-11 12:00:00 2022-01-14 18:30:00
6 2022-01-12 13:00:00 2022-01-14 16:30:00
7 2022-01-12 15:30:00 2022-01-14 17:30:00
df = (df.set_index('datetime', drop=False)
.between_time('9:00','16:00')
.sort_index()
.set_index('datetime2', drop=False)
.between_time('17:00','19:30')
.assign(d = lambda x: x['datetime'].dt.date)
.drop_duplicates('d', keep='last')
.drop('d', axis=1)
.reset_index(drop=True)
)
print (df)
datetime datetime2
0 2022-01-10 10:30:00 2022-01-10 19:30:00
1 2022-01-11 12:00:00 2022-01-14 18:30:00
2 2022-01-12 15:30:00 2022-01-14 17:30:00
If filtering by date
s by datetim2
output is different:
df = (df.set_index('datetime', drop=False)
.between_time('9:00','16:00')
.sort_index()
.set_index('datetime2', drop=False)
.between_time('17:00','19:30')
.assign(d = lambda x: x['datetime2'].dt.date)
.drop_duplicates('d', keep='last')
.drop('d', axis=1)
.reset_index(drop=True)
)
print (df)
datetime datetime2
0 2022-01-10 10:30:00 2022-01-10 19:30:00
1 2022-01-11 10:30:00 2022-01-11 19:30:00
2 2022-01-12 15:30:00 2022-01-14 17:30:00
Upvotes: 3