ShanZhengYang
ShanZhengYang

Reputation: 17631

How to sort strings with integers by numeric ordering?

I have a column in a data.table full of strings in the format string+integer. e.g.

string1, string2, string3, string4, string5,

When I use sort(), I put these strings in the wrong order.

string1, string10, string11, string12, string13, ..., string2, string20, 
string21, string22, string23, ....

How would I sort these to be in the order

string01, string02, string03, string04, strin0g5, ... , string10,, string11, 
string12, etc.   

One method could be to add a 0 to each integer <10, 1-9? I suspect you would extract the string with str_extract(dt$string_column, "[a-z]+") and then add a 0 to each single-digit integer...somehow with sprintf()

Upvotes: 2

Views: 182

Answers (4)

PKumar
PKumar

Reputation: 11128

Assuming the string is something like below:

library(data.table)
library(stringr)

  xstring <- data.table(x = c("string1","string11","string2",'string10',"stringx"))
  extracts <- str_extract(xstring$x,"(?<=string)(\\d*)")
  y_string <- ifelse(nchar(extracts)==2 | extracts=="",extracts,paste0("0",extracts))
  fin_string <- str_replace(xstring$x,"(?<=string)(\\d*)",y_string)
  sort(fin_string)

Output:

> sort(fin_string)
[1] "string01" "string02" "string10" "string11"
[5] "stringx"

Upvotes: 1

d.b
d.b

Reputation: 32548

You could use the str_extract of stringr package to obtain the digits and order according to that

x = c("string1","string3","stringZ","string2","stringX","string10")
library(stringr)
c(x[grepl("\\d+",x)][order(as.integer(str_extract(x[grepl("\\d+",x)],"\\d+")))], 
   sort(x[!grepl("\\d+",x)]))
#[1] "string1"  "string2"  "string3"  "string10" "stringX"  "stringZ" 

Upvotes: 1

989
989

Reputation: 12937

You could go for mixedsort in gtools:

vec <- c("string1", "string10", "string11", "string12", "string13","string2", 
         "string20", "string21", "string22", "string23")

library(gtools)
mixedsort(vec)

#[1] "string1"  "string2"  "string10" "string11" "string12" "string13"
# "string20" "string21" "string22" "string23"

Upvotes: 1

akrun
akrun

Reputation: 887213

We can remove the characters that are not numbers to do the sorting

dt[order(as.integer(gsub("\\D+", "", col1)))]

Upvotes: 6

Related Questions