Reputation: 501
My input:
frame user1 user2 sum_result
0 0 0 0 0
1 1 0 0 0
2 2 0 1 1
3 3 1 1 2
4 4 1 0 1
5 5 0 0 0
I want apply my_func
with condition to `sum_result'.
That my condition: if sum_result
=2 return 'ICV'
, if sum_result
=1 and number of frame
less(<=10) or equal number of frame
with result 2 than return 'ReviewNG'
(number of frame
within sum_result
=2 minus number of frame
within sum_result
=1, so result less or equal 10), if sum_result
=0 return 'Other'.
For example output that I expected:
frame user1 user2 sum_result Result
0 0 0 0 0 Other
1 1 0 0 0 Other
2 2 0 1 1 ReviewNG
3 3 1 1 2 ICV
4 4 1 0 1 ReviewNG
5 5 0 0 0 Other
That my code:
def result_func(row):
for i in range(0,len(df)):
if row==2:
return('|ICV')
elif row==1 & (df['frame'][i]-df.loc[df['sum_result']==2,'frame'].iloc[0]<=10 | df['frame'][i]-df.loc[df['sum_result']==2,'frame'].iloc[-1]<=10):
return('ReviewNG-ICV')
elif row==0:
return('Other')
else:
return ""
and applying on df:
df['result']=df['sum_result'].apply(lambda row: result_func(row))
But I have Error:
IndexError: single positional indexer is out-of-bounds
I understand if in my df no condition to sum_result
=2 it make error. How I can fix my function?
Upvotes: 0
Views: 475
Reputation: 1139
def result_func(row):
if row['sum_result'] == 2:
return "ICV"
elif row['sum_result'] == 1:
new_frame = df.loc[df['sum_result']==2,'frame']
if not new_frame.empty and (row['frame']-new_frame.iloc[0] <=10 or row['frame']-new_frame.iloc[-1] <=10):
return('ReviewNG-ICV')
elif row['sum_result'] == 0:
return "Other"
return "OTHER UNDEFINED VALUES"
df['result']=df[['frame','sum_result']].apply(result_func,axis=1)
If you do not wish to access new_frame
in every loop you can pass arguments to apply funtion
def result_func(row,new_frame):
if row['sum_result'] == 2:
return "ICV"
elif row['sum_result'] == 1:
if not new_frame.empty and (row['frame']-new_frame.iloc[0] <=10 or row['frame']-new_frame.iloc[-1] <=10):
return('ReviewNG-ICV')
elif row['sum_result'] == 0:
return "Other"
return "OTHER UNDEFINED VALUES"
new_frame = df.loc[df['sum_result']==2,'frame']
df['result']=df[['frame','sum_result']].apply(result_func,args=(new_frame,),axis=1)
Output
from tabulate import tabulate
print(tabulate(df, headers='keys', tablefmt='psql'))
+----+---------+---------+---------+--------------+--------------+
| | frame | user1 | user2 | sum_result | result |
|----+---------+---------+---------+--------------+--------------|
| 0 | 0 | 0 | 0 | 0 | Other |
| 1 | 1 | 0 | 0 | 0 | Other |
| 2 | 100 | 0 | 1 | 1 | test |
| 3 | 88 | 1 | 1 | 2 | ICV |
| 4 | 4 | 1 | 0 | 1 | ReviewNG-ICV |
| 5 | 5 | 0 | 0 | 0 | Other |
| 6 | 18 | 1 | 1 | 2 | ICV |
+----+---------+---------+---------+--------------+--------------+
Hope it helps
Upvotes: 1
Reputation: 12524
If no row in the dataframe meets the condition sum_result=2
, then the series df.loc[df['sum_result']==2,'frame']
is empty. In this case, you cannot acces the first or last element of it with df.loc[df['sum_result']==2,'frame'].iloc[0]
or df['frame'][i]-df.loc[df['sum_result']==2,'frame'].iloc[-1]
. This is what triggers your IndexError
.
So, first of all, you should check if df.loc[df['sum_result']==2,'frame']
is actually empty with:
if df.loc[df['sum_result']==2,'frame'].empty:
...
an example of your code could be:
import pandas as pd
df = pd.read_csv('data.csv')
def result_func(row):
for i in range(0,len(df)):
if row==2:
return('ICV')
elif row==1:
if df.loc[df['sum_result']==2,'frame'].empty:
return ('No sum_result==2')
else:
if (df['frame'][i]-df.loc[df['sum_result']==2,'frame'].iloc[0]<=10 | df['frame'][i]-df.loc[df['sum_result']==2,'frame'].iloc[-1]<=10):
return('ReviewNG-ICV')
else:
return('To be defined')
elif row==0:
return('Other')
else:
return ""
df['result']=df['sum_result'].apply(result_func)
Upvotes: 1