Sorting in dplyr produces incorrect output

Question

The arrange() in dplyr produces incorrect result.

library(dplyr)
x <- as.data.frame(cbind(name=c("A","B","C","D"), val=c(0.032, 0.077, 0.4, 0.0001)))
x.1 <- x %>% arrange(val)
x.2 <- x %>% arrange(desc(val))

The outputs are:

   name  val
1    A   0.032
2    B   0.077
3    C   0.4
4    D   1e-04

>x.1
   name  val
1    A   0.032
2    B   0.077
3    C   0.4
4    D   1e-04

> x.2
   name     val
1    D   1e-04
2    C   0.4
3    B   0.077
4    A   0.032

Both ascending and descending order sort producing incorrect output. Not sure what I am doing wrong here? Thank you.

Rich Scriven · Accepted Answer

as.data.frame(cbind()) is what you are doing wrong there. Everything is converted to character in cbind(), and then to factor in as.data.frame(). Have a look ...

str(x)
# 'data.frame': 4 obs. of  2 variables:
#  $ name: Factor w/ 4 levels "A","B","C","D": 1 2 3 4
#  $ val : Factor w/ 4 levels "0.032","0.077",..: 1 2 3 4

I don't know where people are learning this method of creating data frames, but it's terrible practice and should never be used.

Use data.frame() to create data frames, that's why it's there (or when using dplyr, there is data_frame() as well).

library(dplyr)
x <- data.frame(name=c("A","B","C","D"), val=c(0.032, 0.077, 0.4, 0.0001))
x.1 <- x %>% arrange(val)
x.2 <- x %>% arrange(desc(val))

x.1
#   name    val
# 1    D 0.0001
# 2    A 0.0320
# 3    B 0.0770
# 4    C 0.4000

x.2
#   name    val
# 1    C 0.4000
# 2    B 0.0770
# 3    A 0.0320
# 4    D 0.0001

Sorting in dplyr produces incorrect output

Answers (1)

Related Questions