Reputation: 835
I have two dataframes each denoting actual rain and predicted rain condition. Actual rain dataframe is constant as it is a known result. Predicted rain dataframe They are given below.
actul =
index rain
Day1 True
Day2 False
Day3 True
Day4 True
Predicted rain dataframe is given below. This dataframe keeps on changing based on predicted model used.
prdt =
index rain
Day1 False
Day2 True
Day3 True
Day4 False
I am developing prediction accuracy of above prediction model as given below:
#Following computes the number days on which raining was predicted correctly
a = sum(np.where(((actul['rain'] == True)&(prdt['rain']==True)),True,False))
#Following computes the number days on which no-rain was predicted correctly
b = sum(np.where(((actul['rain'] == False)&(prdt['rain']==False)),True,False))
#Following computes the number days on which raining was incorrectly predicted
c = sum(np.where(((actul['rain'] == True)&(prdt['rain']==False)),True,False))
#Following computes the number days on which no-rain was incorrectly predicted
d = sum(np.where(((actul['rain'] == False)&(prdt['rain']==True)),True,False))
predt_per = (a+b)*100/(a+b+c+d)
My above code is taking too much time to compute. Is there a better way to achieve above result?
Now, below accepted answer solved my above problem. Looks like something is wrong in my code given below because I am getting 100%
prediction percentage for all dataframes. My code is:
alldates_df =
index met1_r2 useful met1_r2>0.5
0 0.824113 True True
1 0.903828 True True
2 0.500765 True True
3 0.889757 True True
4 0.890102 True True
5 0.893995 True True
6 0.933482 True True
7 0.872847 True True
8 0.913142 True True
9 0.901424 True True
10 0.910941 True True
11 0.927310 True True
12 0.934538 True True
13 0.946092 True True
14 0.653831 True True
15 0.390702 True False
16 0.878493 True True
17 0.899739 True True
18 0.938481 True True
19 -850.978703 False False
20 -21.802518 False False
met1_detacu = [] # Method1_detection accuracy at various settings
var_flset = np.arange(-5,1,0.01) # various filter settings
for i in var_flset:
pdt_usefl = alldates_df.assign(result=alldates_df['met1_r2']>i)
x = pd.concat([alldates_df['useful'],pdt_usefl['result']],axis=1).sum(1).isin([0,2]).mean()*100
met1_detacu.append(x)
plt.plot(var_flset,met1_detacu)
My above code is working fine but I am getting but I am getting all 100%
detection accuracy at all the varible filter settings
. Something is wrong here.
Obtained plot:
Expected plot is:
@WeNYoBen
Upvotes: 0
Views: 74
Reputation: 323226
In your case assuming the index is the index of df , so we can using sum
after concat
, since True + True ==2 and False + False ==0
pd.concat([df1,df2],axis=1).sum(1).isin([0,2]).mean()*100
25.0
Update
met1_detacu = [] # Method1_detection accuracy at various settings
var_flset = np.arange(-5,1,0.01) # various filter settings
for i in var_flset:
pdt_usefl = alldates_df.assign(result=alldates_df['met1_r2']>i)
x = pd.concat([alldates_df['useful'],pdt_usefl['result']],axis=1).sum(1).isin([0,2]).mean()*100
met1_detacu.append(x)
plt.plot(var_flset,met1_detacu)
Upvotes: 1