dcasblue
dcasblue

Reputation: 15

Sorting in dplyr produces incorrect output

The arrange() in dplyr produces incorrect result.

library(dplyr)
x <- as.data.frame(cbind(name=c("A","B","C","D"), val=c(0.032, 0.077, 0.4, 0.0001)))
x.1 <- x %>% arrange(val)
x.2 <- x %>% arrange(desc(val))

The outputs are:

   name  val
1    A   0.032
2    B   0.077
3    C   0.4
4    D   1e-04

>x.1
   name  val
1    A   0.032
2    B   0.077
3    C   0.4
4    D   1e-04

> x.2
   name     val
1    D   1e-04
2    C   0.4
3    B   0.077
4    A   0.032

Both ascending and descending order sort producing incorrect output. Not sure what I am doing wrong here? Thank you.

Upvotes: 0

Views: 274

Answers (1)

Rich Scriven
Rich Scriven

Reputation: 99331

as.data.frame(cbind()) is what you are doing wrong there. Everything is converted to character in cbind(), and then to factor in as.data.frame(). Have a look ...

str(x)
# 'data.frame': 4 obs. of  2 variables:
#  $ name: Factor w/ 4 levels "A","B","C","D": 1 2 3 4
#  $ val : Factor w/ 4 levels "0.032","0.077",..: 1 2 3 4

I don't know where people are learning this method of creating data frames, but it's terrible practice and should never be used.

Use data.frame() to create data frames, that's why it's there (or when using dplyr, there is data_frame() as well).

library(dplyr)
x <- data.frame(name=c("A","B","C","D"), val=c(0.032, 0.077, 0.4, 0.0001))
x.1 <- x %>% arrange(val)
x.2 <- x %>% arrange(desc(val))

x.1
#   name    val
# 1    D 0.0001
# 2    A 0.0320
# 3    B 0.0770
# 4    C 0.4000

x.2
#   name    val
# 1    C 0.4000
# 2    B 0.0770
# 3    A 0.0320
# 4    D 0.0001

Upvotes: 3

Related Questions