Achal Neupane
Achal Neupane

Reputation: 5729

How to sort vector elements in R based on values before the delimiter

I have a vector called myvec. I want to sort the values based on numbers before -. Any suggestion?

myvec <- c("2-1_16S_S217_R1_001.fastq", "2-2_16S_S226_R1_001.fastq", "3-1_16S_S234_R1_001.fastq", 
"3-2_16S_S242_R1_001.fastq", "11-1_16S_S199_R1_001.fastq", "1-1_16S_S197_R1_001.fastq", 
"11-2_16S_S209_R1_001.fastq", "1-2_16S_S207_R1_001.fastq")

Also, when I do sort (myvec), I get:

[1] "1-1_16S_S197_R1_001.fastq"  "1-2_16S_S207_R1_001.fastq"  "11-1_16S_S199_R1_001.fastq" "11-2_16S_S209_R1_001.fastq"
[5] "2-1_16S_S217_R1_001.fastq"  "2-2_16S_S226_R1_001.fastq"  "3-1_16S_S234_R1_001.fastq"  "3-2_16S_S242_R1_001.fastq"

Also tried

require('gtools')
mixedsort(myvec)

which gives:

[1] "1-2_16S_S207_R1_001.fastq"  "1-1_16S_S197_R1_001.fastq"  "2-2_16S_S226_R1_001.fastq"  "2-1_16S_S217_R1_001.fastq" 
[5] "3-2_16S_S242_R1_001.fastq"  "3-1_16S_S234_R1_001.fastq"  "11-2_16S_S209_R1_001.fastq" "11-1_16S_S199_R1_001.fastq"

result I want:

1-1_16S_S197_R1_001.fastq
1-2_16S_S207_R1_001.fastq
2-1_16S_S217_R1_001.fastq
2-2_16S_S226_R1_001.fastq
3-1_16S_S234_R1_001.fastq
3-2_16S_S242_R1_001.fastq
11-1_16S_S199_R1_001.fastq
11-1_16S_S199_R1_001.fastq

Upvotes: 2

Views: 45

Answers (3)

akrun
akrun

Reputation: 887951

One option is to get the number with parse_number, order on it

myvec[order(readr::parse_number(myvec))]
#[1] "1-1_16S_S197_R1_001.fastq"  "1-2_16S_S207_R1_001.fastq"  
#[3] "2-1_16S_S217_R1_001.fastq"  "2-2_16S_S226_R1_001.fastq" 
#[5] "3-1_16S_S234_R1_001.fastq"  "3-2_16S_S242_R1_001.fastq"
#[7] "11-1_16S_S199_R1_001.fastq" "11-2_16S_S209_R1_001.fastq"

Or we want to sort based on alphanumeric characters before the first -, then extract the substring with str_extract (from stringr), use mixedorder (from gtools) to order the vector

library(gtools)
library(stringr)
myvec[mixedorder(str_extract(myvec, "^[^-]+"))]

Upvotes: 1

Julius Vainora
Julius Vainora

Reputation: 48251

We may also use str_sort from stringr:

stringr:::str_sort(myvec, numeric = TRUE)
# [1] "1-1_16S_S197_R1_001.fastq" 
# [2] "1-2_16S_S207_R1_001.fastq" 
# [3] "2-1_16S_S217_R1_001.fastq" 
# [4] "2-2_16S_S226_R1_001.fastq" 
# [5] "3-1_16S_S234_R1_001.fastq" 
# [6] "3-2_16S_S242_R1_001.fastq" 
# [7] "11-1_16S_S199_R1_001.fastq"
# [8] "11-2_16S_S209_R1_001.fastq"

Upvotes: 2

Jilber Urbina
Jilber Urbina

Reputation: 61214

> myvec[order(as.numeric(sub("(^\\d+).*", "\\1", myvec)))]

[1] "1-1_16S_S197_R1_001.fastq" 
[2] "1-2_16S_S207_R1_001.fastq" 
[3] "2-1_16S_S217_R1_001.fastq" 
[4] "2-2_16S_S226_R1_001.fastq" 
[5] "3-1_16S_S234_R1_001.fastq" 
[6] "3-2_16S_S242_R1_001.fastq" 
[7] "11-1_16S_S199_R1_001.fastq"
[8] "11-2_16S_S209_R1_001.fastq"

Upvotes: 1

Related Questions