Reputation: 644
I need to loop over a dataframe and check if the odd rows of a specific column are equal to a given variable( same for even rows).
This is my code:
mydf = pd.read_excel(test.xlsx, header=0, index= False)
mydf = mydf.sort_values(by='Time')
if ((mydf['Door Name'].iloc[::2]=='RDC_OUT-1') & (mydf['Door Name'].iloc[1::2]=='RDC_IN-1')):
for i in range (l):
mydf['diff'] = mydf['Times'].iloc[1::2].to_numpy() - mydf['Times'].iloc[::2]
Total = mydf['diff'].sum()
print('Total: ',Total)
but when I run it I get this error:
if ((mydf['Door Name'].iloc[::2]=='RDC_OUT-1') & (mydf['Door Name'].iloc[1::2]=='RDC_IN-1')):
File "C:\Users\khoul\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py", line 1478, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
this is my dataframe:
Door name Time Last Name First Name
RDC_IN-1 05/08/2019 15:23:00 aa bb
RDC_OUT-1 05/08/2019 12:39:00 aa bb
RDC_IN-1 05/08/2019 12:13:00 aa bb
RDC_OUT-1 05/08/2019 09:10:00 aa bb
I don't know why it wouldn't accept it!
Upvotes: 1
Views: 2254
Reputation: 863146
My idea is first select all sorted rows by Time
with RDC_OUT-1
and next RDC_IN-1
values, another rows are filter out:
mydf = df.sort_values(by='Time')
m1 = mydf['Door name'] == 'RDC_OUT-1'
m2 = mydf['Door name'] == 'RDC_IN-1'
m11 = m1 & m2.shift(-1)
m22 = m1.shift() & m2
df = mydf[m11 | m22]
print (mydf)
Door name Time Last Name First Name
3 RDC_OUT-1 2019-05-08 09:10:00 aa bb
2 RDC_IN-1 2019-05-08 12:13:00 aa bb
1 RDC_OUT-1 2019-05-08 12:39:00 aa bb
0 RDC_IN-1 2019-05-08 15:23:00 aa bb
So because get same number of IN
and OUT
rows solution from comments should working nice:
Total = (df.loc[df['Door name']=='RDC_IN-1','Time'] -
df.loc[df['Door name']=='RDC_OUT-1','Time'].to_numpy()).sum()
print('Total: ',Total)
Total: 0 days 05:47:00
Upvotes: 1