Pavle
Pavle

Reputation: 207

Calculate difference between grouped elements in pandas

I have a dataframe that looks like this:

df = pd.DataFrame({'patient_id': ['p1', 'p2', 'p3', 'p1', 'p2', 'p3'],
              'treatment_time': ['pre', 'pre', 'pre', 'post', 'post', 'post'],
              'val1': [1, 4, 9, 2, 6, 10],
              'val2': [3, 5, 11, 1, 4, 9],
              'val3': [2, 4, 6, 3, 5, 7],
              })
print(df)

  patient_id treatment_time  val1  val2  val3
0         p1            pre     1     3     2
1         p2            pre     4     5     4
2         p3            pre     9    11     6
3         p1           post     2     1     3
4         p2           post     6     4     5
5         p3           post    10     9     7

My goal is to group data by patient_id and treatment_time and to calculate the difference between pre-treatment and post-treatment values. The expected output should look like this:

  patient_id  val1  val2  val3
0         p1   1.0  -2.0   1.0
1         p2   2.0  -1.0   1.0
2         p3   1.0  -2.0   1.0

Upvotes: 1

Views: 61

Answers (3)

sammywemmy
sammywemmy

Reputation: 28644

Create a MultiIndex dataFrame, via pd.pivot and subtract pre from post:

temp = df.pivot('patient_id', 'treatment_time').swaplevel(axis=1)

temp['post'] - temp['pre']

             val1  val2  val3
patient_id                  
p1             1    -2     1
p2             2    -1     1
p3             1    -2     1

Upvotes: 1

mozway
mozway

Reputation: 260725

If you always have only one 'pre' and 'post' per patient_id and the rows are sorted to always have pre first and post next, a simple trick would be:

cols = df.filter(like='val').columns
out = df.groupby('patient_id')[cols].diff().dropna(how='all')

output:

>>> out
   val1  val2  val3
3   1.0  -2.0   1.0
4   2.0  -1.0   1.0
5   1.0  -2.0   1.0
sorting the rows (if needed)
df = df.sort_values(by='treatment_time', ascending=False)

Upvotes: 3

jezrael
jezrael

Reputation: 862681

Use DataFrame.set_index with subtract in MultiIndex DataFrame selected by DataFrame.xs:

df1 = df.set_index(['patient_id','treatment_time'])
print (df1)
                           val1  val2  val3
patient_id treatment_time                  
p1         pre                1     3     2
p2         pre                4     5     4
p3         pre                9    11     6
p1         post               2     1     3
p2         post               6     4     5
p3         post              10     9     7

df = df1.xs('post', level=1).sub(df1.xs('pre', level=1)).reset_index()
print (df)
  patient_id  val1  val2  val3
0         p1     1    -2     1
1         p2     2    -1     1
2         p3     1    -2     1

Upvotes: 3

Related Questions