Reputation: 7387
I use Pandas to retrieve a lot of Data via an SQL query (from Hive). I have a big DataFrame now:
market_pings = pandas.read_sql_query(query, engine)
market_pings['event_time'] = pandas.to_datetime(market_pings['event_time'])
I have calculated Time Delta
periods which are: if something interesting happens within the timeline of these events within this market_pings
DataFrame, I want the logs of that time interval only.
To grab DataFrame rows where a column has certain values there is a cool trick:
valuelist = ['value1', 'value2', 'value3']
df = df[~df.column.isin(value_list)]
Does anyone have an idea how to do this for time periods, so that I get the events of certain times from the market_pings DataFrame without direct Iteration (row by row)? I can build a list of periods (1s accuracy) like:
2015-08-03 19:19:47
2015-08-03 19:20:00
But this means my valuelist
becomes a tupel and I somehow have to compare dates.
Upvotes: 2
Views: 51
Reputation: 4375
You can create a list of time stamp as value_list and do operation you intend to.
time_list = [pd.Timestamp('2015-08-03 19:19:47'),pd.Timestamp('2015-08-03 19:20:00') ]
One thing in using between_time() is index have to be that date or time, If not you can set by set_index()
mydf = pd.Series(np.random.randn(4), time_list)
mydf
Out[123]:
2015-08-03 19:19:47 0.632509
2015-08-03 19:20:00 -0.234267
2015-08-03 19:19:48 0.159056
2015-08-03 21:20:00 -0.842017
dtype: float64
mydf.between_time(start_time=pd.Timestamp('2015-08-03 19:19:47'),
end_time=pd.Timestamp('2015-08-03 19:20:00'),include_end=False)
Out[124]:
2015-08-03 19:19:47 0.632509
2015-08-03 19:19:48 0.159056
dtype: float64
mydf.between_time(start_time=pd.Timestamp('2015-08-03 19:19:47'),
end_time=pd.Timestamp('2015-08-03 19:20:00'),
include_end=False,include_start=False)
Out[125]:
2015-08-03 19:19:48 0.159056
dtype: float64
Upvotes: 1