Reputation: 4055
I am trying to loop
through a dataframe
and get the index
values into a dictionary
where two conditions are met. And then start iterating over the rows again from the last index
value where condition was met.
I have this so far
d = {}
index_number = 12
for i, r in df.iloc[index_number:].iterrows():
print(index_number)
if r['Entry'] == 'Y':
print(i)
ix_num = i + 1
for e, ro in df.iloc[ix_num:].iterrows():
if ro['Exit'] == 'E':
d[i] = e
index_number = e
print(e)
input('Check')
break
df:
Entry Exit
12 Y NaN
13 Y NaN
14 Y E
15 Y E
16 Y NaN
17 Y NaN
18 Y NaN
19 NaN E
20 Y NaN
21 NaN E
22 NaN E
23 NaN E
24 Y NaN
25 Y NaN
26 NaN E
27 Y NaN
28 NaN E
29 NaN E
The problem I am facing is, for some reason index_number
is not being used for the first loop when the conditions are both met.
Expected output:
d = {12:14,15:19,20:21,24:26,27:28}
Thanks for your help
Edit:
I am using the following for now:
v = []
x = []
for i, r in df.iterrows():
if r['Entry'] == 'Y':
x.append(i)
if r['Exit'] == 'E':
v.append(i)
d = {}
exce = []
check_val = 0
for i in x:
if i > check_val:
for e in v:
if e>i and e not in exce:
d[i] = e
exce.append(e)
check_val = e
break
Upvotes: 0
Views: 321
Reputation: 6166
Vectorization operation:
df = pd.concat([df[df.Exit=='E']['Exit'],df[df.Entry=='Y']['Entry']])
df = df.reset_index().rename(columns = {0:'label'}).sort_values('index')
df = df[df['label']!=df['label'].shift(1)]
df['E_index'] = df['index'].shift(-1)
df = df[(df['label']+df['label'].shift(-1))=='YE']
d = dict(zip(df['index'].astype(int), df['E_index'].astype(int)))
print(d)
{12: 14, 15: 19, 20: 21, 24: 26, 27: 28}
Upvotes: 1