Michelle
Michelle

Reputation: 263

Groupby ID, sort by Time, and divide last by first

I have the following dataframe:

ID  .....  Quantity     Time
54         100          2020-01-01 00:00:04
55         100          2020-01-01 00:00:04
54         88           2020-01-01-00:00:05
54         66           2020-01-01 00:00:06
55         100          2020-01-01 00:00:07
55         88           2020-01-01 00:00:07     

I would like to group the dataframe (sorted by time!) by ID and then take the quantity of the last row and divide it by the first row per ID.

The result should look like this:

ID  .....  Quantity     Time                     Result
54         100          2020-01-01 00:00:04
54         88           2020-01-01-00:00:05
54         66           2020-01-01 00:00:06      0.66
55         100          2020-01-01 00:00:04      
55         100          2020-01-01 00:00:07
55         88           2020-01-01 00:00:07      0.88

So far I used the following code to get the first and the last row for every ID.

g = df.sort_values(by=['Time']).groupby('ID')

df_new=(pd.concat([g.head(1), g.tail(1)])
   .sort_values(by='ID')
   .reset_index(drop=True))

and then I used the following code to get the Result of the division:

df_new['Result'] = df_new['Quantity'].iloc[1::2].div(df_new['Quantity'].shift())

The problem is: the dataframe stays not sorted by time. It is really important that I take (timewise) the last quantity per ID and divide it by the first quantity (in time) per ID.

Thanks for any hints where I need to change the code!

Upvotes: 3

Views: 441

Answers (2)

sammywemmy
sammywemmy

Reputation: 28709

Convert Time to datetime :

  df["Time"] = pd.to_datetime(df["Time"])

Sort on ID and Time :

  df = df.sort_values(["ID", "Time"])

Group on ID:

 grouping = df.groupby("ID").Quantity

Get results for the division of last by first:

result = grouping.last().div(grouping.first()).array

Now, you can assign the results back to the original dataframe :

df.loc[df.Quantity.eq(grouping.transform("last")), "Result"] = result
df

    ID  Quantity    Time    Result
0   54  100 2020-01-01 00:00:04 NaN
2   54  88  2020-01-01 00:00:05 NaN
3   54  66  2020-01-01 00:00:06 0.66
1   55  100 2020-01-01 00:00:04 NaN
4   55  100 2020-01-01 00:00:07 NaN
5   55  88  2020-01-01 00:00:07 0.88

Upvotes: 1

jezrael
jezrael

Reputation: 863156

There are not pairs ID values, but triples, so first convert column to datetime if necessary by to_datetime, then sorting per 2 columns by DataFrame.sort_values and last use second or third solution from previous answer:

df['Time'] = pd.to_datetime(df['Time'])
df = df.sort_values(['ID','Time'])

first = df.groupby('ID')['Quantity'].transform('first')
df['Result'] = df.drop_duplicates('ID', keep='last')['Quantity'].div(first)
print (df)

  ID  Quantity                Time  Result
0  54       100 2020-01-01 00:00:04     NaN
2  54        88 2020-01-01 00:00:05     NaN
3  54        66 2020-01-01 00:00:06    0.66
1  55       100 2020-01-01 00:00:04     NaN
4  55       100 2020-01-01 00:00:07     NaN
5  55        88 2020-01-01 00:00:07    0.88

Upvotes: 2

Related Questions