tyeung
tyeung

Reputation: 11

Is there a more elegant way to test if a forecasted model is correct?

I have a modeled/forecasted change and an actual change. The forecasted change is in a column named forecastHPIChange and the actual change is named HPIChange. It's in the following form:

        HPIChange forecastHPIChange
1              NA      1.547368e-02
2   -0.0026155187      1.485668e-02
3    0.0002906977      1.251108e-02
4   -0.0077877127      1.718729e-02
5    0.0200058841      2.143551e-02

I want to test for the 143 instances, whether the sign alignment of the forecast is correct. So there are really four cases:

  1. Forecast is positive and actual is positive -> Correct-positive
  2. Forecast is negative and actual is negative -> Correct-negative
  3. Forecast is positive and actual is negative -> incorrect-positive
  4. Forecast is negative and actual is positive -> incorrect-negative

To check this, I've hacked together the following code and I could feed them into a data frame but I wanted to check to see if there is a more elegant way to do this check?

data1 %>%
  filter(forecastHPIChange > 0 & HPIChange > 0) %>%
  summarise(correct = n())  

data1 %>%
  filter(forecastHPIChange < 0 & HPIChange < 0) %>%
            summarise(correct = n())  

data1 %>%
  filter(forecastHPIChange < 0 & HPIChange > 0) %>%
            summarise(wrong = n())  

data1 %>%
  filter(forecastHPIChange > 0 & HPIChange < 0) %>%
            summarise(wrong = n())  

Upvotes: 1

Views: 44

Answers (2)

Sandipan Dey
Sandipan Dey

Reputation: 23109

Starting with the following data (changed your example data a little bit to have datapoints present for all the classes TP, FP, TN, FN):

 data1
      HPIChange forecastHPIChange
1            NA        0.01547368
2 -0.0026155187        0.01485668
3  0.0002906977        0.01251108
4 -0.0077877127       -0.01718729
5  0.0200058841       -0.02143551

# transform the data1 to dataset data2 where we have only + and - labels (represented by +1 and -1)
data2 <- as.data.frame(sapply(data1, function(x) ifelse(x > 0, 1, -1)))

table(data2)       

    forecastHPIChange
HPIChange  -1 1
       -1   1 1   #  1,  1 = TP   1, -1 = FN
        1   1 1   # -1. -1 = TN  -1,  1 = FP

# using the package caret
library(caret)
confusionMatrix(data2$forecastHPIChange, data2$HPIChange)

Upvotes: 0

G. Grothendieck
G. Grothendieck

Reputation: 269860

Try confusionMatrix in the caret package:

library(caret)

make_factor <- function(x) factor(sign(x), levels = c(-1, 1))
signs <- as.data.frame(lapply(data1, make_factor))
with(signs, confusionMatrix(forecastHPIChange, reference = HPIChange))

or using a pipeline:

library(purrr)

data1 %>%
      map_df(make_factor) %>%
      { confusionMatrix(.$forecastHPIChange, reference = .$HPIChange) }

Either gives:

Confusion Matrix and Statistics

          Reference
Prediction -1 1
        -1  0 0
        1   2 2

               Accuracy : 0.5             
                 95% CI : (0.0676, 0.9324)
    No Information Rate : 0.5             
    P-Value [Acc > NIR] : 0.6875          

                  Kappa : 0               
 Mcnemar's Test P-Value : 0.4795          

            Sensitivity : 0.0             
            Specificity : 1.0             
         Pos Pred Value : NaN             
         Neg Pred Value : 0.5             
             Prevalence : 0.5             
         Detection Rate : 0.0             
   Detection Prevalence : 0.0             
      Balanced Accuracy : 0.5        

For the input shown not all factor levels appeared but if the actual input does have all factor levels then we could eliminate make_factor and just use sign instead.

Note: The input data1 in reproducible form used above is:

data1 <- structure(list(HPIChange = c(NA, -0.0026155187, 0.0002906977, 
-0.0077877127, 0.0200058841), forecastHPIChange = c(0.01547368, 
0.01485668, 0.01251108, 0.01718729, 0.02143551)), .Names = c("HPIChange", 
"forecastHPIChange"), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5"))

Upvotes: 2

Related Questions