kamiks
kamiks

Reputation: 184

Can't use dplyr::arrange() to sort a column in the form of a date in r

Does anyone know the reason why dplyr's arrange() function cannot sort a column who's column name is in the form of a date-like string?

Take a look at the example below:

rnames <- LETTERS[1:10]
set.seed(1)
values <- runif(10, 0, 10) %>% 
  round(1) %>% 
  data.frame(., row.names = rnames) %>% 
  `colnames<-`("2022-03-01") 
values %>% dplyr::arrange("2022-03-01") 

If you run that block of code, you can clearly see that the column did not sort:

 2022-03-01
A        2.7
B        3.7
C        5.7
D        9.1
E        2.0
F        9.0
G        9.4
H        6.6
I        6.3
J        0.6

There are a variety of ways to fix the code in order to allow for sorting, including, but not limited to: (i) using dplyr::arrange_all(), (ii) nesting dplyr::across() within the arrange() call, (iii) changing the column name to one not resembling a date, or (iv) using base R's order() function in conjunction with brackets.

My question is why the arrange() function does not work (without throwing an error) despite the fact that the column is in character form:

> typeof(colnames(values))
[1] "character

I ask because for people working with stock or other time series data, sometimes dates do become column names and so to the extent they need to sort such columns, this quirk could produce unexpected results.

Upvotes: 2

Views: 818

Answers (1)

akrun
akrun

Reputation: 887531

Instead of the double quoted column name, use backquote

library(dplyr)
values %>% 
   dplyr::arrange(`2022-03-01`) 

-output

   2022-03-01
J        0.6
E        2.0
A        2.7
B        3.7
C        5.7
I        6.3
H        6.6
F        9.0
D        9.1
G        9.4

If we want to pass as string, either use within across

values %>%
   dplyr::arrange(across("2022-03-01"))
  2022-03-01
J        0.6
E        2.0
A        2.7
B        3.7
C        5.7
I        6.3
H        6.6
F        9.0
D        9.1
G        9.4

Or convert to symbol and evaluate (!!)

values %>%
  dplyr::arrange(!! rlang::sym("2022-03-01"))
  2022-03-01
J        0.6
E        2.0
A        2.7
B        3.7
C        5.7
I        6.3
H        6.6
F        9.0
D        9.1
G        9.4

Or with .data

values %>% 
  dplyr::arrange(.data[["2022-03-01"]])

Upvotes: 4

Related Questions