Reputation: 263
I have the following dataframe:
ID ..... Quantity Time
54 100 2020-01-01 00:00:04
55 100 2020-01-01 00:00:04
54 88 2020-01-01-00:00:05
54 66 2020-01-01 00:00:06
55 100 2020-01-01 00:00:07
55 88 2020-01-01 00:00:07
I would like to group the dataframe (sorted by time!) by ID and then take the quantity of the last row and divide it by the first row per ID.
The result should look like this:
ID ..... Quantity Time Result
54 100 2020-01-01 00:00:04
54 88 2020-01-01-00:00:05
54 66 2020-01-01 00:00:06 0.66
55 100 2020-01-01 00:00:04
55 100 2020-01-01 00:00:07
55 88 2020-01-01 00:00:07 0.88
So far I used the following code to get the first and the last row for every ID.
g = df.sort_values(by=['Time']).groupby('ID')
df_new=(pd.concat([g.head(1), g.tail(1)])
.sort_values(by='ID')
.reset_index(drop=True))
and then I used the following code to get the Result of the division:
df_new['Result'] = df_new['Quantity'].iloc[1::2].div(df_new['Quantity'].shift())
The problem is: the dataframe stays not sorted by time. It is really important that I take (timewise) the last quantity per ID and divide it by the first quantity (in time) per ID.
Thanks for any hints where I need to change the code!
Upvotes: 3
Views: 441
Reputation: 28709
Convert Time to datetime :
df["Time"] = pd.to_datetime(df["Time"])
Sort on ID and Time :
df = df.sort_values(["ID", "Time"])
Group on ID:
grouping = df.groupby("ID").Quantity
Get results for the division of last by first:
result = grouping.last().div(grouping.first()).array
Now, you can assign the results back to the original dataframe :
df.loc[df.Quantity.eq(grouping.transform("last")), "Result"] = result
df
ID Quantity Time Result
0 54 100 2020-01-01 00:00:04 NaN
2 54 88 2020-01-01 00:00:05 NaN
3 54 66 2020-01-01 00:00:06 0.66
1 55 100 2020-01-01 00:00:04 NaN
4 55 100 2020-01-01 00:00:07 NaN
5 55 88 2020-01-01 00:00:07 0.88
Upvotes: 1
Reputation: 863156
There are not pairs ID
values, but triples, so first convert column to datetime if necessary by to_datetime
, then sorting per 2 columns by DataFrame.sort_values
and last use second or third solution from previous answer:
df['Time'] = pd.to_datetime(df['Time'])
df = df.sort_values(['ID','Time'])
first = df.groupby('ID')['Quantity'].transform('first')
df['Result'] = df.drop_duplicates('ID', keep='last')['Quantity'].div(first)
print (df)
ID Quantity Time Result
0 54 100 2020-01-01 00:00:04 NaN
2 54 88 2020-01-01 00:00:05 NaN
3 54 66 2020-01-01 00:00:06 0.66
1 55 100 2020-01-01 00:00:04 NaN
4 55 100 2020-01-01 00:00:07 NaN
5 55 88 2020-01-01 00:00:07 0.88
Upvotes: 2