Python - Add a column to a dataframe containing a value from another row based on condition

Question

My dataframe looks like this:

+-----+-------+----------+-------+
| No  | Group | refGroup | Value |
+-----+-------+----------+-------+
| 123 | A1    | A1       |   5.0 |
| 123 | B1    | A1       |   7.3 |
| 123 | B2    | A1       |   8.9 |
| 123 | B3    | B1       |   7.9 |
| 465 | A1    | A1       |   1.4 |
| 465 | B1    | A1       |   4.5 |
| 465 | B2    | B1       |   7.3 |
+-----+-------+----------+-------+

Now I need to add another column which conatains the difference between the value of column Value from the current row and the value of column Value from the row with the same number (No) and the group (Group) that is written in refGroup.

Exeption: If refGroup equals Group, Value and refValue are the same.

So the result should be:

+-----+-------+----------+-------+----------+
| No  | Group | refGroup | Value | refValue |
+-----+-------+----------+-------+----------+
| 123 | A1    | A1       |   5.0 |      5.0 |
| 123 | B1    | A1       |   7.3 |      2.3 |
| 123 | B2    | A1       |   8.9 |      3.9 |
| 123 | B3    | B1       |   7.9 |      0.6 |
| 465 | A1    | A1       |   1.4 |      1.4 |
| 465 | B1    | A1       |   4.5 |      3.1 |
| 465 | B2    | B1       |   7.3 |      2.8 |
+-----+-------+----------+-------+----------+

Explanation for the first two rows:

First row: refGroup equals Group -> refValue = Value

Second row: search for the row with the same No (123) and refGroup as Group (A1) and calculate Value of the current row minus Value of the referenced row (7.3 - 5.0 = 2.3).

I thought I might need to use groupby() and apply(), but how?

Hope my example is detailed enough, if you need any further information, please ask :)

Scott Boston · Accepted Answer

One way is to use a database SQL like technique; use 'self-join' with merge. You merge/join a dataframe to itself using left_on and right_on to line up 'Group' with 'refGroup' then subtract the value from each dataframe record:

df_out = df.merge(df, 
                  left_on=['No','refGroup'], 
                  right_on=['No','Group'], 
                  suffixes=('','_ref'))

df['refValue'] = np.where(df_out['Group'] == df_out['refGroup'],
                          df_out['value'],
                          df_out['value'] - df_out['value_ref'])

df

Output:

    No Group refGroup  value  refValue
0  123    A1       A1    5.0       5.0
1  123    B1       A1    7.3       2.3
2  123    B2       A1    8.9       3.9
3  123    B3       B1    7.9       0.6
4  465    A1       A1    1.4       1.4
5  465    B1       A1    4.5       3.1
6  465    B2       B1    7.3       2.8

Python - Add a column to a dataframe containing a value from another row based on condition

Answers (2)

Related Questions