hernanavella
hernanavella

Reputation: 5552

How to subset a Pandas dataframe applying a function by date?

This is a simplified version of my data:

Date and Time           Price   Volume
2015-01-01 17:00:00.211 2030.25 342
2015-01-01 17:00:02.456 2030.75 725
2015-01-01 17:00:02.666 2030.75 203
2015-01-02 17:00:00.074 2031.00 101
2015-01-02 17:00:16.221 2031.75 245
2015-01-02 17:00:25.882 2031.75 100
2015-01-03 17:00:00.054 2031.00 180
2015-01-03 17:00:25.098 2031.75 849
2015-01-03 17:00:45.188 2031.75 549

I would like subset the dataframe selecting the min of the column 'Volume' each day, with the corresponding 'Date and Time' and 'Price'. The output would be:

Date and Time           Price   Volume
2015-01-01 17:00:02.666 2030.75 203
2015-01-02 17:00:25.882 2031.75 100
2015-01-03 17:00:00.054 2031.00 180

Thanks

Upvotes: 1

Views: 1167

Answers (2)

Boa
Boa

Reputation: 2677

Group the rows by day, then get the row with the minimum Volume, for each day:

from pandas import DatetimeIndex, DataFrame

df = DataFrame(...)
times = DatetimeIndex(df['Date and Time'])
grouped = df.groupby([times.day])

# takes DataFrame as input; returns the DataFrame row with the lowest 'Volume'
find_min = lambda cur_df: cur_df.ix[cur_df['Volume'].idxmin()]
# assemble a DataFrame from Series objects
result = DataFrame([find_min(x[1]) for x in grouped])
result = result.reset_index(drop=True)   # optional re-indexing

print result    

The output:

             Date and Time    Price Volume
0  2015-01-01 17:00:02.666  2030.75    203
1  2015-01-02 17:00:25.882  2031.75    100
2  2015-01-03 17:00:00.054  2031.00    180

Upvotes: 1

Joe T. Boka
Joe T. Boka

Reputation: 6581

The easiest way to do it is if you separate the Date and Time into two separate columns. As you said in your post you need "min of the column 'Volume' each day".

    Date        Time            Price    Volume
0   2015-01-01  17:00:00.211    2030.25     342
1   2015-01-01  17:00:02.456    2030.75     725
2   2015-01-01  17:00:02.666    2030.75     203
3   2015-01-02  17:00:00.074    2031.00     101
4   2015-01-02  17:00:16.221    2031.75     245
5   2015-01-02  17:00:25.882    2031.75     100
6   2015-01-03  17:00:00.054    2031.00     180
7   2015-01-03  17:00:25.098    2031.75     849
8   2015-01-03  17:00:45.188    2031.75     549

df = df.groupby('Date')['Volume'].min()
print (df)

The output is the minimum of the column Volume each day.

Date
2015-01-01    203
2015-01-02    100
2015-01-03    180
Name: Volume, dtype: object

EDIT : If you want to also get the indices of the original DataFrame (corresponding Time and the Price) you can do this instead:

idx = df.groupby(['Date'])['Volume'].transform(min) == df['Volume']
df[idx]

In this case the output:

    Date        Time            Price    Volume
2   2015-01-01  17:00:02.666    2030.75     203
5   2015-01-02  17:00:25.882    2031.75     100
6   2015-01-03  17:00:00.054    2031.00     180

Upvotes: 2

Related Questions