For loop with factor data

Question

I have two vectors of factor data with equal length. Just for examples sake:

observed=c("a", "b", "c", "a", "b", "c", "a")
predicted=c("a", "a", "b", "b", "b", "c", "c")

Ultimately, I am trying to generate a classification matrix showing the number of times each factor is correctly predicted. This would look like the following for the example:

Note that the tables() command doesn't work here because I have 11 different factors, and the output would be 11x11 instead of 11x2. My plan is to create three vectors, and combine them into a data frame.

First, a vector of the unique factor values in the existing vectors. This is simple enough,

 names=unique(df$observed)

Next, a vector of values showing the number of correct predictions. This is where I am running into trouble. I can get the number of correct predictions for an individual factor like so:

correct.a=sum(predicted[which(observed == "a")] == "a")

But this is cumbersome to repeat time and time again, and then combine into a vector like

correct=c("correct.a", "correct.b", correct.c")

Is there a way to use a loop (or other strategy that you can think of) to improve this process?

Also note that the final vector I would create would be something like this:

incorrect.a=sum(observed == "a")-correct.a

Bulat · Accepted Answer

I would suggest you use data.table for explicit clean way to define your results:

library(data.table)
observed=c("a", "b", "c", "a", "b", "c", "a")
predicted=c("a", "a", "b", "b", "b", "c", "c")

dt <- data.table(observed, predicted)

res <- dt[, .(
  T = sum(observed == predicted), 
  F = sum(observed != predicted)), 
  observed
]

res
#   observed T F
# 1:        a 1 2
# 2:        b 1 1
# 3:        c 1 1

For loop with factor data

Answers (2)

Related Questions