awkwardpotato
awkwardpotato

Reputation: 41

Generate difference between the first and last measurement from aggregated dataset


I have a dataset with several repeated measures over a week for each participant, which I aggregated using `groupby` and `.agg` functions to have the first and the last measurement taken as follows:

df2=df.groupby("ID").agg(['first', 'last']).stack()

Now it looks like this:

| ID |      | Age | Score | Measure |
| -- | ---- | --- | ----- | ------  |
| 0  | first| 69  | 48    | 34      |
|    | last | 69  | 52    | 33      |
| 1  | first| 58  | 28    | 89      |
|    | last | 58  | 14    | 34      |

I want to create a dataset using first and last value aggregates to create a difference between these values for Score and Measure variables. It should look like this:

| ID | Age | Score_diff | Measure_diff |
| -- | --- | ---------- | ------------ |
| 0  | 69  | 4          | 1            |
| 1  | 58  | 14         | 55           |

I am not sure how to approach this (I'm a Python newbie). Is there anyone who could help me out?

Best wishes

Upvotes: 2

Views: 82

Answers (2)

Anurag Dabas
Anurag Dabas

Reputation: 24314

import pandas as pd
import numpy as np

you can also do this by:

df=df.reset_index()

result=(df.groupby(['ID','Age'])
 .agg(lambda x:abs(np.diff(x)))
 .rename(columns=lambda x:x+'_diff')
 .reset_index())

Now If you print result you will get:

ID  Age   Score_diff    Measure_diff        
0   69      4           1
1   58      14          55

Upvotes: 2

Camilo Martínez M.
Camilo Martínez M.

Reputation: 1575

You could try the following:

import pandas as pd

df2 = df.groupby("id").agg(['first', 'last']).stack().reset_index().set_index("id")

>>> df2
       level_1  Age  Score  Measure
    id                             
    0    first   69     48       34
    0     last   69     52       33
    1    first   58     28       89
    1     last   58     14       34

You groupby("id") so that you apply the following functions to each one of them and agg them to a list.

df3 = df2.groupby("id").agg(list)
df3["Age"] = df3["Age"].apply(lambda x: x[0]) # x[0] and x[1] are equal

>>> df3
              level_1  Age     Score   Measure
    id                                        
    0   [first, last]   69  [48, 52]  [34, 36]
    1   [first, last]   58  [28, 14]  [89, 34]

Then you apply a lambda function to get the difference between the two values.

df3["Score_diff"] = df3["Score"].apply(lambda x: abs(x[1] - x[0]))
df3["Measure_diff"] = df3["Measure"].apply(lambda x: abs(x[1] - x[0]))

And delete the columns you don't want

df3 = df3.drop(columns=["level_1", "Score", "Measure"])

>>> df3
        Age  Score_diff  Measure_diff
    id                               
    0    69           4             2
    1    58          14            55

Upvotes: 2

Related Questions