user9394674
user9394674

Reputation:

Subtracting values from a row based off other columns

Sorry about the vague title it's hard to explain. It's easier to display.

I'm trying subtract values in the same row but based off strings in other columns. Here is an input df:

import pandas as pd
import numpy as np

k = 5
N = 8

d = ({'Time' : np.random.randint(k, k + 100 , size=N),
    'Events' : ['ABC','DEF','GHI','JKL','ABC','DEF','GHI','JKL'],
    'Number1' : ['xx','xx',1,'xx','xx','xx',2,'xx'],
    'Number2' : ['xx',1,'xx',1,'xx',2,'xx',2]})

df = pd.DataFrame(data=d)

Output:

  Events Number1 Number2  Time
0    ABC      xx      xx    14
1    DEF      xx       1    34
2    GHI       1      xx    78
3    JKL      xx       1    49
4    ABC      xx      xx    49
5    DEF      xx       2    24
6    GHI       2      xx    19
7    JKL      xx       2    67

I want to export values based on the difference in Time. The first time difference column will be ABC - DEF and the second column will be GHI - JKL.

I need to repeat this process a number of times. The example above displays a loop of 2 times. I can use the integers for columns Number1 and Number2 but they aren't in order.

I tried to combine and ffill these columns to display an order. And then use this column as a reference.

for col in ['Number2']:
    df[col] = df[col].ffill() 

But this creates 5 identical integers when I need 4.

I then manually subtracted the appropriate values via row slicing but it becomes very inefficient when I have to do this numerous times.

Is it possible to create a loop subtracting the intended rows?

For the above example the output would be:

   Diff_1  Diff_2
0     -20      29
1      25     -48

Upvotes: 1

Views: 94

Answers (2)

harpan
harpan

Reputation: 8641

import pandas as pd
import numpy as np

k = 5
N = 8

d = ({'Time' : np.random.randint(k, k + 100 , size=N),
    'Events' : ['ABC','DEF','GHI','JKL','ABC','DEF','GHI','JKL'],
    'Number1' : ['xx','xx',1,'xx','xx','xx',2,'xx'],
    'Number2' : ['xx',1,'xx',1,'xx',2,'xx',2]})

df = pd.DataFrame(data=d)
print(df)

Output:

      Events Number1 Number2  Time
0    ABC      xx      xx     8
1    DEF      xx       1    54
2    GHI       1      xx    52
3    JKL      xx       1    101
4    ABC      xx      xx    56
5    DEF      xx       2    34
6    GHI       2      xx    81
7    JKL      xx       2    23

This would have the new col in df. We only care about the rows for ABC and GHI

df['diff'] = df['Time'] - df['Time'].shift(-1)
diff = pd.DataFrame({
    'diff1' : list(df.loc[df['Events'] == 'ABC', 'diff']),
    'diff2' : list(df.loc[df['Events'] == 'GHI', 'diff'])
})
print(diff)

Output:

   diff1  diff2
0  -46.0  -49.0
1   22.0   58.0

Upvotes: 2

ALollz
ALollz

Reputation: 59579

You can use shift to easily find the difference for a df that is ordered like that. But with this there are a lot of differences you don't care about. You want the 0th, 4th, 8th.... difference for the first difference and the 2nd, 6th, 10th... difference for the second difference. This is easy to accomplish with .iloc

import pandas as pd
diff = (df.Time-df.Time.shift(-1))

pd.DataFrame({'Diff_1': diff.iloc[::4].values,
              'Diff_2': diff.iloc[2::4].values})
#   Diff_1  Diff_2
#0   -20.0    29.0
#1    25.0   -48.0

Upvotes: 1

Related Questions