kongkkk
kongkkk

Reputation: 15

In R, why order() works like this?

X <- matrix(c(10,9,
              1,4,
              9,1,
              5,10,
              3,10,
              3,6), ncol=2, byrow = TRUE)

y <-c("a","b","a",'c','c','b')

X_new <- matrix(c(6,4,
                  9,2,
                  7,2), ncol=2, byrow=TRUE)

knn<- function(train_x, train_y, test_x){
  train_x <- as.matrix(train_x)
  test_x <- as.matrix(test_x)
  
  d1 <- dim(train_x)
  n1 <- d1[1] 
  p2 <- d1[2] 
  
  d2 <- dim(test_x)
  n2 <- d2[1] 
  p2 <- d2[2]
  
  pred_y <- rep(0,n2)
  for (i in 1:n2) {
    X_temp = train_x - matrix(test_x[i,], nrow=nrow(train_x),ncol=ncol(train_x), byrow = TRUE)
    euc_dist =sqrt(rowSums(X_temp^2))
    print(euc_dist)
    print(order(euc_dist))
    pred_y[i] <- train_y[which.min(euc_dist)]
  }
  return (pred_y)
}

knn(X,y,X_new)

this prints like the below

[1] 6.403124 5.000000 4.242641 6.082763 6.708204 3.605551

[1] 6 3 2 4 1 5

[1] 7.071068 8.246211 1.000000 8.944272 10.000000 7.211103

[1] 3 1 6 2 4 5

[1] 7.615773 6.324555 2.236068 8.246211 8.944272 5.656854

[1] 3 6 2 1 4 5

[1] "b" "a" "a"

I think the first order() should print "5 3 2 4 6 1"

"6 3 2 4 1 5" isn't what I expected. there's something I miss??

Upvotes: 0

Views: 25

Answers (1)

WilliamGram
WilliamGram

Reputation: 683

What you are seeing is the order in which the values should appear to get an ordered vector. This is more easily shown than explained:

out <- c(6.403124, 5.000000, 4.242641, 6.082763, 6.708204, 3.605551)
order(out)
# [1] 6 3 2 4 1 5

So like you say, it looks odd, because you expect c(2, 4, 5, 3, 1, 6) (or c(5, 3, 2, 4, 6, 1) in ascending order). But one common way of using order is x[order(x)] to get the ordered vector, and if you do that, you get:

out[order(out)]
# [1] 3.605551 4.242641 5.000000 6.082763 6.403124 6.708204

or

out[order(out, decreasing = TRUE)]
# [1] 6.708204 6.403124 6.082763 5.000000 4.242641 3.605551

which is pretty useful.

If you want to know which place in the hierarchy the values are, you could go with rank:

rank(out)
# [1] 5 3 2 4 6 1

or

rank(desc(out))
# [1] 2 4 5 3 1 6

Upvotes: 1

Related Questions