Reputation: 5552
This is a simplified version of my data:
Date and Time Price Volume
2015-01-01 17:00:00.211 2030.25 342
2015-01-01 17:00:02.456 2030.75 725
2015-01-01 17:00:02.666 2030.75 203
2015-01-02 17:00:00.074 2031.00 101
2015-01-02 17:00:16.221 2031.75 245
2015-01-02 17:00:25.882 2031.75 100
2015-01-03 17:00:00.054 2031.00 180
2015-01-03 17:00:25.098 2031.75 849
2015-01-03 17:00:45.188 2031.75 549
I would like subset the dataframe selecting the min of the column 'Volume' each day, with the corresponding 'Date and Time' and 'Price'. The output would be:
Date and Time Price Volume
2015-01-01 17:00:02.666 2030.75 203
2015-01-02 17:00:25.882 2031.75 100
2015-01-03 17:00:00.054 2031.00 180
Thanks
Upvotes: 1
Views: 1167
Reputation: 2677
Group the rows by day, then get the row with the minimum Volume, for each day:
from pandas import DatetimeIndex, DataFrame
df = DataFrame(...)
times = DatetimeIndex(df['Date and Time'])
grouped = df.groupby([times.day])
# takes DataFrame as input; returns the DataFrame row with the lowest 'Volume'
find_min = lambda cur_df: cur_df.ix[cur_df['Volume'].idxmin()]
# assemble a DataFrame from Series objects
result = DataFrame([find_min(x[1]) for x in grouped])
result = result.reset_index(drop=True) # optional re-indexing
print result
The output:
Date and Time Price Volume
0 2015-01-01 17:00:02.666 2030.75 203
1 2015-01-02 17:00:25.882 2031.75 100
2 2015-01-03 17:00:00.054 2031.00 180
Upvotes: 1
Reputation: 6581
The easiest way to do it is if you separate the Date
and Time
into two separate columns. As you said in your post you need "min of the column 'Volume' each day".
Date Time Price Volume
0 2015-01-01 17:00:00.211 2030.25 342
1 2015-01-01 17:00:02.456 2030.75 725
2 2015-01-01 17:00:02.666 2030.75 203
3 2015-01-02 17:00:00.074 2031.00 101
4 2015-01-02 17:00:16.221 2031.75 245
5 2015-01-02 17:00:25.882 2031.75 100
6 2015-01-03 17:00:00.054 2031.00 180
7 2015-01-03 17:00:25.098 2031.75 849
8 2015-01-03 17:00:45.188 2031.75 549
df = df.groupby('Date')['Volume'].min()
print (df)
The output is the minimum of the column Volume
each day.
Date
2015-01-01 203
2015-01-02 100
2015-01-03 180
Name: Volume, dtype: object
EDIT : If you want to also get the indices of the original DataFrame (corresponding Time and the Price) you can do this instead:
idx = df.groupby(['Date'])['Volume'].transform(min) == df['Volume']
df[idx]
In this case the output:
Date Time Price Volume
2 2015-01-01 17:00:02.666 2030.75 203
5 2015-01-02 17:00:25.882 2031.75 100
6 2015-01-03 17:00:00.054 2031.00 180
Upvotes: 2