Reputation: 255
I have some dataframe with timestamps as a column, I want to filter rows between 8:00:00 to 17:00:00 with np.where. I keep getting error messages on data/object types. Any help would be appreciated
example:
timestamp volume
2013-03-01 07:59:00 5
2013-03-01 08:00:00 6
2013-03-01 08:01:00 7
2013-03-01 08:02:00 8
Basically I want to end with:
2013-03-01 08:00:00 6
2013-03-01 08:01:00 7
2013-03-01 08:02:00 8
By using methods along the line of
np.where(df['timestamp'] > dt.time('8:00:00')
Upvotes: 4
Views: 18059
Reputation: 811
if you have a file with data as below : timestamp volume 2013-03-01 07:59:00 5 2013-03-01 08:00:00 6 2013-03-01 08:01:00 7 2013-03-01 08:02:00 8
Then while reading only you can skip the first line and you will get output as timestamp volume 2013-03-01 08:00:00 6 2013-03-01 08:01:00 7 2013-03-01 08:02:00 8
import pandas as pd
df=pd.read_csv("filename",skiprows=1)
print(df)
Upvotes: 0
Reputation: 2493
You can use between
I Generated a sample dataframe with
import datetime
d = {'timestamp': pd.Series([datetime.datetime.now() +
datetime.timedelta(hours=i) for i in range(20)]),
'volume': pd.Series([s for s in range(20)])}
df = pd.DataFrame(d)
df['timeframe']
is
0 2017-02-13 22:37:54.515840
1 2017-02-13 23:37:54.515859
2 2017-02-14 00:37:54.515865
3 2017-02-14 01:37:54.515870
4 2017-02-14 02:37:54.515878
5 2017-02-14 03:37:54.515884
6 2017-02-14 04:37:54.515888
...
17 2017-02-14 15:37:54.515939
18 2017-02-14 16:37:54.515943
19 2017-02-14 17:37:54.515948
df.dtypes
timestamp datetime64[ns]
volume int64
dtype: object
As in your example dtype
of df['timestamp']
is object
you can do
df['timestamp'] = pd.to_datetime(df['timestamp'], coerce=True)
By setting param coerce=True
if the conversion fails for any particular string then those rows are set to NaT
.
Then filtering can be done using between
as below
df[df.timestamp.dt.strftime('%H:%M:%S').between('11:00:00','18:00:00')]
will return
13 2017-02-14 11:37:54.515922 13
14 2017-02-14 12:37:54.515926 14
15 2017-02-14 13:37:54.515930 15
16 2017-02-14 14:37:54.515935 16
17 2017-02-14 15:37:54.515939 17
18 2017-02-14 16:37:54.515943 18
19 2017-02-14 17:37:54.515948 19
Upvotes: 2
Reputation: 210832
Try this:
In [226]: df
Out[226]:
timestamp volume
0 2013-03-01 07:59:00 5
1 2013-03-01 08:00:00 6
2 2013-03-01 08:01:00 7
3 2013-03-01 08:02:00 8
In [227]: df.dtypes
Out[227]:
timestamp object
volume int64
dtype: object
In [228]: df['timestamp'] = pd.to_datetime(df['timestamp'], errors='coerce')
In [229]: df.dtypes
Out[229]:
timestamp datetime64[ns] # <---- it's `datetime64[ns]` now
volume int64
dtype: object
In [230]: df.set_index('timestamp').between_time('08:00','17:00').reset_index()
Out[230]:
timestamp volume
0 2013-03-01 08:00:00 6
1 2013-03-01 08:01:00 7
2 2013-03-01 08:02:00 8
Upvotes: 2