marko
marko

Reputation: 103

Python pandas: filtering rows based on time criteria using pandas

I have a CSV file with millions of rows in the following format:

Amount,Price,Time
0.36,13924.98,2010-01-01 00:00:08
0.01,13900.09,2010-01-01 00:02:04
0.02,13907.59,2010-01-01 00:04:54
0.07,13907.59,2010-01-01 00:05:03
0.03,13925,2010-01-01 00:05:41
0.03,13920,2010-01-01 00:07:02
0.15,13910,2010-01-01 00:09:37
0.03,13909.99,2010-01-01 00:09:58
0.03,13909.99,2010-01-01 00:10:03
0.14,13909.99,2010-01-01 00:10:03

I want to first filer this data and then perform some calculation on the filtered data. I import it using pandas data = pd.read_csv(), to get a DataFrame.

I then transform the Time column to TimeDelta column (which I am not sure is necessary for what I want to do) where I write the time difference to the time 2010-01-01 00:00:00 by using

data['TimeDelta'] = pd.to_timedelta(pd.to_datetime(data.Date)-pd.Timedelta(days=14610))/np.timedelta64(1, 'm')

Here comes the part that I struggle with. I want a function that returns a new DataFrame, where I want only the first row after every n minutes, where n is an integer defined by the user.

For example. If n=5, the desired output of this function for my data would be:

Amount,Price,Time
0.36,13924.98,2010-01-01 00:00:08
0.07,13907.59,2010-01-01 00:05:03
0.03,13909.99,2010-01-01 00:10:03

And the output for n=3 would be:

Amount,Price,Time
0.36,13924.98,2010-01-01 00:00:08
0.02,13907.59,2010-01-01 00:04:54
0.15,13910,2010-01-01 00:09:37

I have tried doing this using the floor and the remainder %, but being a beginner with Python I am unable to get it working.

Upvotes: 0

Views: 67

Answers (1)

yatu
yatu

Reputation: 88226

Use pd.Grouper:

n=5
df.groupby(pd.Grouper(key = 'Time', freq=f'{n} min')).first()

                      Amount   Price
Time                                 
2010-01-01 00:00:00    0.36  13924.98
2010-01-01 00:05:00    0.07  13907.59
2010-01-01 00:10:00    0.03  13909.99

Upvotes: 1

Related Questions