Reputation: 510
I have a Dataframe like:
timestamp Order Price Quantity
0 2019-10-09 09:15:42 0 27850.00 2040
1 2019-10-09 09:15:42 0 27850.00 1980
2 2019-10-09 09:15:53 0 27860.85 1800
3 2019-10-09 09:16:54 0 27860.85 2340
4 2019-10-09 09:18:48 0 27860.85 1500
5 2019-10-09 09:21:08 0 27979.00 1840
6 2019-10-09 09:21:08 0 27979.00 2020
7 2019-10-09 09:21:12 0 27850.00 1800
8 2019-10-09 09:21:15 0 27850.00 1580
9 2019-10-09 09:21:21 35 28000.00 1840
10 2019-10-09 09:21:23 34 28000.00 1800
11 2019-10-09 09:28:17 0 28035.00 2020
12 2019-10-09 09:28:18 0 28035.00 1960
13 2019-10-09 09:28:18 0 28035.00 1920
14 2019-10-09 09:28:24 0 28035.00 1940
15 2019-10-09 09:28:24 0 28035.00 1960
16 2019-10-09 09:28:25 0 28000.00 2140
17 2019-10-09 09:28:25 0 28000.00 2020
18 2019-10-09 09:28:26 0 28000.00 2120
I want to check when successive Price Values are same then return the row with Max Quantity Value.
My Result Dataframe Like:
timestamp Order Price Quantity
0 2019-10-09 09:15:42 0 27850.00 2040
3 2019-10-09 09:16:54 0 27860.85 2340
6 2019-10-09 09:21:08 0 27979.00 2020
7 2019-10-09 09:21:12 0 27850.00 1800
9 2019-10-09 09:21:21 35 28000.00 1840
11 2019-10-09 09:28:17 0 28035.00 2020
16 2019-10-09 09:28:25 0 28000.00 2140
PS: Here in result table Price Value 27850.00 appears once more in Row No:7 and will be considered as independently. Similarly for 28000.00 also.
Upvotes: 6
Views: 262
Reputation: 9868
First create a price_group
column to identify consecutive rows with the same price (as in this answer).
price_group = (df.Price != df.Price.shift()).cumsum()
Then group the rows by this column and find the rows with max quantity for each group (as in these answers).
result = df.loc[df.Quantity.groupby(price_group).idxmax()]
Upvotes: 4
Reputation: 2492
This is not the slimmest solution, but I think it makes it more obvious what is happening. I'm sure it can be trimmed down to more concise code.
import pandas as pd
# Generating a similar df
df = pd.DataFrame({'Order' :[1,2,3,4,5,6,7],
'Price' :[27850.00,27850.00,27860.85,27860.85,27860.85,27979.00,27979.00],
'Quantity':[2040, 1980, 1800, 2340 ,1500, 1840, 2020 ]
})
print(df)
print("--------------")
# Get the unique values from the Price column
# This tells us which values we want to select the highest value from
values = df["Price"].unique()
# Loop through the values, selecting the rows which match each value, one at a time
for value in values:
# df["Price"] == value" (Selects all the rows where price equals ONE of the values)
# For example, the above will give us 3 rows where Price == 27860.85
# .max() gives us the row with the largest value from Quantity, since the Price column are all equal
# The above would give us a Series with two values, Price and Quantity. I.e.
# Price 27860.85
# Quantity 2340.00
# ["Quantity"] then selects only the Quantity value and assigns it to highest
highest = df[df["Price"] == value].max()["Quantity"]
print(value, "...", highest)
# You can, during this loop, build a new dict object to create a new df if desired
Or, more succinctly...
# Create a new list in one line
highest = [ df[df["Price"] == value].max()["Quantity"] for value in df["Price"].unique()]
# Add as columns to new df
df1 = pd.DataFrame({
'Price' :df["Price"].unique(),
'Quantity':highest
})
print(df1)
Use the same idea to grab the appropriate value from other columns for each unique Price
, and add them to the new df1
Upvotes: 0
Reputation: 1126
Something like this:
from itertools import groupby
x = [[list(n) for m, n in groupby(df['Price'])]][0]
y = [(ind,val) for ind,val in enumerate(x)]
z = [i[0] for i in y for j in i[1]]
df['label'] = z
# it gives you df like this
# Unnamed: 0 Unnamed: 1 timestamp Order Price Quantity label
# 0 0 09.10.2019 9:15:42 0 27850.00 2040 0
# 1 1 09.10.2019 9:15:42 0 27850.00 1980 0
# 2 2 09.10.2019 9:15:53 0 27860.85 1800 1
# 3 3 09.10.2019 9:16:54 0 27860.85 2340 1
# 4 4 09.10.2019 9:18:48 0 27860.85 1500 1
# 5 5 09.10.2019 9:21:08 0 27979.00 1840 2
# 6 6 09.10.2019 9:21:08 0 27979.00 2020 2
# 7 7 09.10.2019 9:21:12 0 27850.00 1800 3
# 8 8 09.10.2019 9:21:15 0 27850.00 1580 3
# 9 9 09.10.2019 9:21:21 35 28000.00 1840 4
# 10 10 09.10.2019 9:21:23 34 28000.00 1800 4
# 11 11 09.10.2019 9:28:17 0 28035.00 2020 5
# 12 12 09.10.2019 9:28:18 0 28035.00 1960 5
# 13 13 09.10.2019 9:28:18 0 28035.00 1920 5
# 14 14 09.10.2019 9:28:24 0 28035.00 1940 5
# 15 15 09.10.2019 9:28:24 0 28035.00 1960 5
# 16 16 09.10.2019 9:28:25 0 28000.00 2140 6
# 17 17 09.10.2019 9:28:25 0 28000.00 2020 6
# 18 18 09.10.2019 9:28:26 0 28000.00 2120 6
# then you able to use groupby
df.groupby('label').max()
Out[27]:
Unnamed: 0 Unnamed: 1 timestamp Order Price Quantity
label
0 1 09.10.2019 9:15:42 0 27850.00 2040
1 4 09.10.2019 9:18:48 0 27860.85 2340
2 6 09.10.2019 9:21:08 0 27979.00 2020
3 8 09.10.2019 9:21:15 0 27850.00 1800
4 10 09.10.2019 9:21:23 35 28000.00 1840
5 15 09.10.2019 9:28:24 0 28035.00 2020
6 18 09.10.2019 9:28:26 0 28000.00 2140
Upvotes: 2