Reputation: 660
I have the following list:
datalist <- c("20191107_1545_28.xlsx","20191108_1520_95.xlsx",""20191108_1104_99.xlsx"","20200127_1505_28.xlsx", "20200124_1505_41B.xlsx", "20200122_1505_1.xlsx", "20191102_1520_102.xlsx")
which I want to order by the last number, and then by the first number(date), so that is looks like:
"20200122_1505_1.xlsx" "20191107_1545_28.xlsx" "20200127_1505_28.xlsx" "20200124_1505_41B.xlsx" "20191108_1520_95.xlsx" "20191104_1106_99.xlsx" "20191102_1520_102.xlsx"
I have been playing around with StrReverse, so I could then just order it normally, but unfortunately, it of course also reverses the number. I tried to split the string first:
split=str_split(datalist, "_")
but I don't know how to continue. The the number that I want to order with could be 1, 2 or 3 digits and could also contain a B (like in the example). Does anyone know how to fix this? Thank in advance!
Upvotes: 1
Views: 256
Reputation: 39858
One stringr
option could be:
datalist[str_order(str_extract_all(datalist, "\\d+", simplify = TRUE)[, 3], numeric = TRUE)]
[1] "20200122_1505_1.xlsx" "20191107_1545_28.xlsx" "20200127_1505_28.xlsx"
[4] "20200124_1505_41B.xlsx" "20191108_1520_95.xlsx" "20191108_1104_99.xlsx"
[7] "20191102_1520_102.xlsx"
Or a more flexible option:
datalist[str_order(sapply(str_extract_all(datalist, "\\d+"), tail, 1), numeric = TRUE)]
If you indeed want to order according multiple numbers, with the addition of dplyr
:
bind_cols(datalist = datalist,
as.data.frame(str_extract_all(datalist, "\\d+", simplify = TRUE))) %>%
mutate_at(vars(starts_with("V")), ~ as.numeric(as.character(.))) %>%
arrange(V3, V1)
datalist V1 V2 V3
<chr> <dbl> <dbl> <dbl>
1 20200122_1505_1.xlsx 20200122 1505 1
2 20191107_1545_28.xlsx 20191107 1545 28
3 20200127_1505_28.xlsx 20200127 1505 28
4 20200124_1505_41B.xlsx 20200124 1505 41
5 20191108_1520_95.xlsx 20191108 1520 95
6 20191108_1104_99.xlsx 20191108 1104 99
7 20191102_1520_102.xlsx 20191102 1520 102
Upvotes: 0
Reputation: 113
I think this does the trick. Note, it only sorts by the actual number and ignores the letters. It's not sensitive to letters attached at the end of the last number, since that's how the data looks, but the regular expression can be modified to fit whatever needs.
library(data.table)
datalist <- c("20191107_1545_28.xlsx","20191108_1520_95.xlsx","20191108_1104_99.xlsx","20200127_1505_28.xlsx", "20200124_1505_41B.xlsx", "20200122_1505_1.xlsx", "20191102_1520_102.xlsx")
dt <- data.table('datalist' = datalist)
dt[, 'num1' := as.numeric(gsub(pattern = '(\\d{1,10})(_)(\\d{1,10})(_)(\\d{1,10})(.*)', x = datalist, replacement = '\\1'))]
dt[, 'num2' := as.numeric(gsub(pattern = '(\\d{1,10})(_)(\\d{1,10})(_)(\\d{1,10})(.*)', x = datalist, replacement = '\\3'))]
dt[, 'num3' := as.numeric(gsub(pattern = '(\\d{1,10})(_)(\\d{1,10})(_)(\\d{1,10})(.*)', x = datalist, replacement = '\\5'))]
setkey(dt, num3, num1)
print(dt$datalist)
Edit: forgot to make coerce to numeric. Fixed.
Upvotes: 1