Msquare
Msquare

Reputation: 835

comparing two dataframe columns of booleans

I have two dataframes each denoting actual rain and predicted rain condition. Actual rain dataframe is constant as it is a known result. Predicted rain dataframe They are given below.

actul = 

index  rain
Day1   True
Day2   False
Day3   True
Day4   True

Predicted rain dataframe is given below. This dataframe keeps on changing based on predicted model used.

prdt = 

index  rain
Day1   False
Day2   True
Day3   True
Day4   False

I am developing prediction accuracy of above prediction model as given below:

#Following computes the number days on which raining was predicted correctly        
a = sum(np.where(((actul['rain'] == True)&(prdt['rain']==True)),True,False))  
#Following computes the number days on which no-rain was predicted correctly    
b = sum(np.where(((actul['rain'] == False)&(prdt['rain']==False)),True,False))
#Following computes the number days on which raining was incorrectly predicted 
c = sum(np.where(((actul['rain'] == True)&(prdt['rain']==False)),True,False))
#Following computes the number days on which no-rain was incorrectly predicted     
d = sum(np.where(((actul['rain'] == False)&(prdt['rain']==True)),True,False))

predt_per =  (a+b)*100/(a+b+c+d)

My above code is taking too much time to compute. Is there a better way to achieve above result?

Now, below accepted answer solved my above problem. Looks like something is wrong in my code given below because I am getting 100% prediction percentage for all dataframes. My code is:

alldates_df = 

index       met1_r2    useful     met1_r2>0.5
0          0.824113     True        True
1          0.903828     True        True
2          0.500765     True        True
3          0.889757     True        True
4          0.890102     True        True
5          0.893995     True        True
6          0.933482     True        True
7          0.872847     True        True
8          0.913142     True        True
9          0.901424     True        True
10         0.910941     True        True
11         0.927310     True        True
12         0.934538     True        True
13         0.946092     True        True
14         0.653831     True        True
15         0.390702     True        False
16         0.878493     True        True
17         0.899739     True        True
18         0.938481     True        True
19      -850.978703     False       False
20       -21.802518     False       False

met1_detacu = [] # Method1_detection accuracy at various settings
var_flset = np.arange(-5,1,0.01) # various filter settings
for i in var_flset:
    pdt_usefl =  alldates_df.assign(result=alldates_df['met1_r2']>i)
    x = pd.concat([alldates_df['useful'],pdt_usefl['result']],axis=1).sum(1).isin([0,2]).mean()*100
    met1_detacu.append(x)
plt.plot(var_flset,met1_detacu)

My above code is working fine but I am getting but I am getting all 100% detection accuracy at all the varible filter settings. Something is wrong here. Obtained plot: enter image description here

Expected plot is:

enter image description here

@WeNYoBen

Upvotes: 0

Views: 74

Answers (1)

BENY
BENY

Reputation: 323226

In your case assuming the index is the index of df , so we can using sum after concat , since True + True ==2 and False + False ==0

pd.concat([df1,df2],axis=1).sum(1).isin([0,2]).mean()*100
25.0

Update

met1_detacu = [] # Method1_detection accuracy at various settings
var_flset = np.arange(-5,1,0.01) # various filter settings
for i in var_flset:
    pdt_usefl =  alldates_df.assign(result=alldates_df['met1_r2']>i)
    x = pd.concat([alldates_df['useful'],pdt_usefl['result']],axis=1).sum(1).isin([0,2]).mean()*100
    met1_detacu.append(x)
plt.plot(var_flset,met1_detacu)

Upvotes: 1

Related Questions