Reputation:
Sorry about the vague title it's hard to explain. It's easier to display.
I'm trying subtract values in the same row but based off strings in other columns. Here is an input df:
import pandas as pd
import numpy as np
k = 5
N = 8
d = ({'Time' : np.random.randint(k, k + 100 , size=N),
'Events' : ['ABC','DEF','GHI','JKL','ABC','DEF','GHI','JKL'],
'Number1' : ['xx','xx',1,'xx','xx','xx',2,'xx'],
'Number2' : ['xx',1,'xx',1,'xx',2,'xx',2]})
df = pd.DataFrame(data=d)
Output:
Events Number1 Number2 Time
0 ABC xx xx 14
1 DEF xx 1 34
2 GHI 1 xx 78
3 JKL xx 1 49
4 ABC xx xx 49
5 DEF xx 2 24
6 GHI 2 xx 19
7 JKL xx 2 67
I want to export values based on the difference in Time
. The first time difference column will be ABC - DEF
and the second column will be GHI - JKL
.
I need to repeat this process a number of times. The example above displays a loop of 2 times. I can use the integers for columns Number1
and Number2
but they aren't in order.
I tried to combine and ffill these columns to display an order. And then use this column as a reference.
for col in ['Number2']:
df[col] = df[col].ffill()
But this creates 5 identical integers when I need 4.
I then manually subtracted the appropriate values via row slicing but it becomes very inefficient when I have to do this numerous times.
Is it possible to create a loop subtracting the intended rows?
For the above example the output would be:
Diff_1 Diff_2
0 -20 29
1 25 -48
Upvotes: 1
Views: 94
Reputation: 8641
import pandas as pd
import numpy as np
k = 5
N = 8
d = ({'Time' : np.random.randint(k, k + 100 , size=N),
'Events' : ['ABC','DEF','GHI','JKL','ABC','DEF','GHI','JKL'],
'Number1' : ['xx','xx',1,'xx','xx','xx',2,'xx'],
'Number2' : ['xx',1,'xx',1,'xx',2,'xx',2]})
df = pd.DataFrame(data=d)
print(df)
Output:
Events Number1 Number2 Time
0 ABC xx xx 8
1 DEF xx 1 54
2 GHI 1 xx 52
3 JKL xx 1 101
4 ABC xx xx 56
5 DEF xx 2 34
6 GHI 2 xx 81
7 JKL xx 2 23
This would have the new col in df
. We only care about the rows for ABC
and GHI
df['diff'] = df['Time'] - df['Time'].shift(-1)
diff = pd.DataFrame({
'diff1' : list(df.loc[df['Events'] == 'ABC', 'diff']),
'diff2' : list(df.loc[df['Events'] == 'GHI', 'diff'])
})
print(diff)
Output:
diff1 diff2
0 -46.0 -49.0
1 22.0 58.0
Upvotes: 2
Reputation: 59579
You can use shift to easily find the difference for a df
that is ordered like that. But with this there are a lot of differences you don't care about. You want the 0th, 4th, 8th.... difference for the first difference and the 2nd, 6th, 10th... difference for the second difference. This is easy to accomplish with .iloc
import pandas as pd
diff = (df.Time-df.Time.shift(-1))
pd.DataFrame({'Diff_1': diff.iloc[::4].values,
'Diff_2': diff.iloc[2::4].values})
# Diff_1 Diff_2
#0 -20.0 29.0
#1 25.0 -48.0
Upvotes: 1