Subtracting values from a row based off other columns

Question

Sorry about the vague title it's hard to explain. It's easier to display.

I'm trying subtract values in the same row but based off strings in other columns. Here is an input df:

import pandas as pd
import numpy as np

k = 5
N = 8

d = ({'Time' : np.random.randint(k, k + 100 , size=N),
    'Events' : ['ABC','DEF','GHI','JKL','ABC','DEF','GHI','JKL'],
    'Number1' : ['xx','xx',1,'xx','xx','xx',2,'xx'],
    'Number2' : ['xx',1,'xx',1,'xx',2,'xx',2]})

df = pd.DataFrame(data=d)

Output:

  Events Number1 Number2  Time
0    ABC      xx      xx    14
1    DEF      xx       1    34
2    GHI       1      xx    78
3    JKL      xx       1    49
4    ABC      xx      xx    49
5    DEF      xx       2    24
6    GHI       2      xx    19
7    JKL      xx       2    67

I want to export values based on the difference in Time. The first time difference column will be ABC - DEF and the second column will be GHI - JKL.

I need to repeat this process a number of times. The example above displays a loop of 2 times. I can use the integers for columns Number1 and Number2 but they aren't in order.

I tried to combine and ffill these columns to display an order. And then use this column as a reference.

for col in ['Number2']:
    df[col] = df[col].ffill()

But this creates 5 identical integers when I need 4.

I then manually subtracted the appropriate values via row slicing but it becomes very inefficient when I have to do this numerous times.

Is it possible to create a loop subtracting the intended rows?

For the above example the output would be:

   Diff_1  Diff_2
0     -20      29
1      25     -48

harpan · Accepted Answer

import pandas as pd
import numpy as np

k = 5
N = 8

d = ({'Time' : np.random.randint(k, k + 100 , size=N),
    'Events' : ['ABC','DEF','GHI','JKL','ABC','DEF','GHI','JKL'],
    'Number1' : ['xx','xx',1,'xx','xx','xx',2,'xx'],
    'Number2' : ['xx',1,'xx',1,'xx',2,'xx',2]})

df = pd.DataFrame(data=d)
print(df)

Output:

      Events Number1 Number2  Time
0    ABC      xx      xx     8
1    DEF      xx       1    54
2    GHI       1      xx    52
3    JKL      xx       1    101
4    ABC      xx      xx    56
5    DEF      xx       2    34
6    GHI       2      xx    81
7    JKL      xx       2    23

This would have the new col in df. We only care about the rows for ABC and GHI

df['diff'] = df['Time'] - df['Time'].shift(-1)
diff = pd.DataFrame({
    'diff1' : list(df.loc[df['Events'] == 'ABC', 'diff']),
    'diff2' : list(df.loc[df['Events'] == 'GHI', 'diff'])
})
print(diff)

Output:

   diff1  diff2
0  -46.0  -49.0
1   22.0   58.0

Subtracting values from a row based off other columns

Answers (2)

Related Questions