Groupby ID, sort by Time, and divide last by first

Question

I have the following dataframe:

ID  .....  Quantity     Time
54         100          2020-01-01 00:00:04
55         100          2020-01-01 00:00:04
54         88           2020-01-01-00:00:05
54         66           2020-01-01 00:00:06
55         100          2020-01-01 00:00:07
55         88           2020-01-01 00:00:07

I would like to group the dataframe (sorted by time!) by ID and then take the quantity of the last row and divide it by the first row per ID.

The result should look like this:

ID  .....  Quantity     Time                     Result
54         100          2020-01-01 00:00:04
54         88           2020-01-01-00:00:05
54         66           2020-01-01 00:00:06      0.66
55         100          2020-01-01 00:00:04      
55         100          2020-01-01 00:00:07
55         88           2020-01-01 00:00:07      0.88

So far I used the following code to get the first and the last row for every ID.

g = df.sort_values(by=['Time']).groupby('ID')

df_new=(pd.concat([g.head(1), g.tail(1)])
   .sort_values(by='ID')
   .reset_index(drop=True))

and then I used the following code to get the Result of the division:

df_new['Result'] = df_new['Quantity'].iloc[1::2].div(df_new['Quantity'].shift())

The problem is: the dataframe stays not sorted by time. It is really important that I take (timewise) the last quantity per ID and divide it by the first quantity (in time) per ID.

Thanks for any hints where I need to change the code!

jezrael · Accepted Answer

There are not pairs ID values, but triples, so first convert column to datetime if necessary by to_datetime, then sorting per 2 columns by DataFrame.sort_values and last use second or third solution from previous answer:

df['Time'] = pd.to_datetime(df['Time'])
df = df.sort_values(['ID','Time'])

first = df.groupby('ID')['Quantity'].transform('first')
df['Result'] = df.drop_duplicates('ID', keep='last')['Quantity'].div(first)
print (df)

  ID  Quantity                Time  Result
0  54       100 2020-01-01 00:00:04     NaN
2  54        88 2020-01-01 00:00:05     NaN
3  54        66 2020-01-01 00:00:06    0.66
1  55       100 2020-01-01 00:00:04     NaN
4  55       100 2020-01-01 00:00:07     NaN
5  55        88 2020-01-01 00:00:07    0.88

Groupby ID, sort by Time, and divide last by first

Answers (2)

Related Questions