Reputation: 5
I have a DataFrame from cvs.
I would like to know at what time there is greater probability to have the row of column "First" to 0 between 23:00 to 23:50.
Date First Second
0 2019-01-09 22:59:00 0 20
1 2019-01-09 23:04:00 14 32
2 2019-01-09 23:10:00 9 27
3 2019-01-09 23:11:00 7 27
4 2019-01-09 23:12:00 7 26
5 2019-01-09 23:13:00 7 26
6 2019-01-09 23:14:00 7 25
7 2019-01-09 23:15:00 6 25
8 2019-01-09 23:16:00 5 23
9 2019-01-09 23:17:00 4 22
10 2019-01-09 23:18:00 3 22
... ... ... ...
134761 2019-05-05 21:20:00 18 36
134762 2019-05-05 21:21:00 16 35
134763 2019-05-05 21:22:00 15 34
134764 2019-05-05 21:23:00 14 33
I use this code for select the time expected :
heure = df.set_index('Date').between_time('23:00:00','23:50:00')
But I wasn't able to extract just time.
If you have any suggestions :)
Thanks,
Robin
Upvotes: 0
Views: 72
Reputation: 59519
Filter based on time. Then find the most common time where First is 0.
try:
(df.set_index('Date').between_time('23:00:00','23:50:00').reset_index()
.loc[lambda x: x.First == 0].Date.dt.time.value_counts().index[0])
except IndexError:
print('No matches')
This will return a datetime.time
, or in the case of your sample data, it will print that nothing matches as there are no 0s between the specified times.
Upvotes: 1
Reputation: 1284
How about using dt accessors? Updated with an end to end example for your use case.
import pandas as pd
import numpy as np
df = pd.DataFrame(
{
'date': [
pd.to_datetime('2019-01-09 22:59:00'),
pd.to_datetime('2019-01-09 23:00:00'),
pd.to_datetime('2019-01-09 23:49:59'),
pd.to_datetime('2019-01-09 23:50:00'),
pd.to_datetime('2019-01-09 23:51:00'),
],
'value': [0, 0, 5, 6, 1]
}
)
# A mask to split the datset into two groups, based on the time.
df['in_range'] = np.where((df['date'].dt.hour == 23) & (df['date'].dt.minute < 50), 'In Range', 'Out of Range')
# A column that tests the condition you mentioned
df['condition'] = df['value'] == 0
# Group and get the average, which is the likelihood that value == 0, per group.
print(df.groupby('in_range')['condition'].mean())
Gives:
mask
In Range 0.500000
Out of Range 0.333333
Upvotes: 1
Reputation: 2318
You should convert your "Date" column data to datetime type first and you can apply the indexing method using dt
as mentioned by @smj
import pandas as pd
df = pd.read_csv('./sample.csv')
df['Date'] = pd.to_datetime(df['Date'])
print df[(df['Date'].dt.hour == 23) & (df['Date'].dt.minute < 50)]
Upvotes: 0