Reputation: 585
My dataframe looks like this:
+-----+-------+----------+-------+
| No | Group | refGroup | Value |
+-----+-------+----------+-------+
| 123 | A1 | A1 | 5.0 |
| 123 | B1 | A1 | 7.3 |
| 123 | B2 | A1 | 8.9 |
| 123 | B3 | B1 | 7.9 |
| 465 | A1 | A1 | 1.4 |
| 465 | B1 | A1 | 4.5 |
| 465 | B2 | B1 | 7.3 |
+-----+-------+----------+-------+
Now I need to add another column which conatains the difference between the value of column Value
from the current row and the value of column Value
from the row with the same number (No
) and the group (Group
) that is written in refGroup
.
Exeption: If refGroup
equals Group
, Value
and refValue
are the same.
So the result should be:
+-----+-------+----------+-------+----------+
| No | Group | refGroup | Value | refValue |
+-----+-------+----------+-------+----------+
| 123 | A1 | A1 | 5.0 | 5.0 |
| 123 | B1 | A1 | 7.3 | 2.3 |
| 123 | B2 | A1 | 8.9 | 3.9 |
| 123 | B3 | B1 | 7.9 | 0.6 |
| 465 | A1 | A1 | 1.4 | 1.4 |
| 465 | B1 | A1 | 4.5 | 3.1 |
| 465 | B2 | B1 | 7.3 | 2.8 |
+-----+-------+----------+-------+----------+
Explanation for the first two rows:
First row: refGroup
equals Group
-> refValue
= Value
Second row: search for the row with the same No
(123) and refGroup
as Group
(A1) and calculate Value
of the current row minus Value
of the referenced row (7.3 - 5.0 = 2.3).
I thought I might need to use groupby() and apply(), but how?
Hope my example is detailed enough, if you need any further information, please ask :)
Upvotes: 0
Views: 57
Reputation: 570
With comprehension list you could do:
df['refValue'] = [ row['Value'] - float(df.loc[(df['No']==row['No']) & (df['Group']==row['refGroup']),'Value'].values) if row['refGroup']!=row['Group'] else row['Value'] for index, row in df.iterrows() ]
Upvotes: 1
Reputation: 153460
One way is to use a database SQL like technique; use 'self-join' with merge
. You merge/join a dataframe to itself using left_on
and right_on
to line up 'Group' with 'refGroup' then subtract the value from each dataframe record:
df_out = df.merge(df,
left_on=['No','refGroup'],
right_on=['No','Group'],
suffixes=('','_ref'))
df['refValue'] = np.where(df_out['Group'] == df_out['refGroup'],
df_out['value'],
df_out['value'] - df_out['value_ref'])
df
Output:
No Group refGroup value refValue
0 123 A1 A1 5.0 5.0
1 123 B1 A1 7.3 2.3
2 123 B2 A1 8.9 3.9
3 123 B3 B1 7.9 0.6
4 465 A1 A1 1.4 1.4
5 465 B1 A1 4.5 3.1
6 465 B2 B1 7.3 2.8
Upvotes: 3