Reputation: 89
I am trying to get the start and end dates for particular price. In my example, the price is $3 for few days, then increases to $4 for few days and back to $3 later on.
import pandas as pd
df = pd.DataFrame([
{"Price":3,"ds":"2017-01-01"},
{"Price":3,"ds":"2017-01-02"},
{"Price":3,"ds":"2017-01-03"},
{"Price":3,"ds":"2017-01-04"},
{"Price":3,"ds":"2017-01-05"},
{"Price":4,"ds":"2017-01-06"},
{"Price":4,"ds":"2017-01-07"},
{"Price":4,"ds":"2017-01-08"},
{"Price":4,"ds":"2017-01-09"},
{"Price":3,"ds":"2017-01-10"},
{"Price":3,"ds":"2017-01-11"},
{"Price":3,"ds":"2017-01-12"},
{"Price":3,"ds":"2017-01-13"},
{"Price":3,"ds":"2017-01-14"}
])
start = df.groupby(["Price"])["ds"].min().reset_index()
end = df.groupby(["Price"])["ds"].max().reset_index()
df2 = start.merge(end, how = 'left', on = ["Price"])
df2.rename(columns = {"ds_x":"start_dt", "ds_y":"end_dt"},inplace=True)
Output:
Price start_dt end_dt
3 2017-01-01 2017-01-14
4 2017-01-06 2017-01-09
Desired Output:
Price start_dt end_dt
3 2017-01-01 2017-01-05
4 2017-01-06 2017-01-09
3 2017-01-10 2017-01-14
Upvotes: 1
Views: 186
Reputation: 75080
Starting from your original dataframe , you can use a helper column with series.shift
to compare the next row and use it for grouping, then groupby and agg with min and max, rename and reset the index:
s = df['Price'].ne(df['Price'].shift()).cumsum()
d = {"min":"start_dt", "max":"end_dt"}
out = (df.groupby([s,'Price'])['ds'].agg(['min','max']).rename(columns=d)
.droplevel(0).reset_index())
print(out)
Price start_dt end_dt
0 3 2017-01-01 2017-01-05
1 4 2017-01-06 2017-01-09
2 3 2017-01-10 2017-01-14
Upvotes: 3