Rikouchi
Rikouchi

Reputation: 3

Ruby: how to sort array of string parsing the content

Here is my problem: I have an array of string which contains data like that:

array = ["{109}{08} OK",
         "{98} Thx",
         "{108}{0.8}{908} aa",
         "{8}{51} lorem ipsum"]

I would like to sort this array scanning "data inside": here the integers in brace. So, the final array should be like that :

array.custom_sort! => ["{8}{51} lorem ipsum",
                       "{98} Thx",
                       "{108}{0.8}{908} aa",
                       "{109}{08} OK"]

Is there a nice solution to do it in Ruby? Or should I recreate a new array which inserts each parsed elements?

EDIT:

I failed to mention the sort priorities: First, the sorting is based on the number in braces, up to 3 groups, but cannot be absent.

["{5}something",
 "{61}{64}could",
 "{}be",                  #raise an error or ignore it
 "{54}{31.24}{0.2}write",
 "{11}{21}{87}{65}here",  #raise an error or ignore it
 "[]or",                  #raise an error or ignore it
 "{31}not"]

If the first numbers are equal, then the second ones should be compared. Some examples:

"{15}" < "{151}" < "{151}{32}" < "{152}"
"{1}" < "{012}" < "{12}{-1}{0}" < "{12.0}{0.2}"
"{5}" < "{5}{0}" < "{5}{0}{1}"

But if every numbers are equals, then the string is compares. The only character which make a problem is the space, which must be after every other "visible" characters. Examples:

"{1}a" < "{1}aa" < "{1} a" < "{1}  a"
"{1}" < "{1}a " < "{1}a  " < "{1}a  a"
"{1}a" < "{1}ba" < "{1}b "

I can make it doing somethign like this in a custom class:

class CustomArray
  attr_accessor :one
  attr_accessor :two
  attr_accessor :three
  attr_accessor :text 

  def <=>(other)
    if self.one.to_f < other.one.to_f
      return -1
    elsif self.one.to_f > other.one.to_f
      return 1
    elsif self.two.nil?
      if other.two.nil?
        min = [self.text, other.text].min
        i = 0
        until i == min
          if self.text[i].chr == ' ' #.chr is for compatibility with Ruby 1.8.x
            if other.text[i].chr != ' '
              return 1
            end
          else
            if other.text[i].chr == ' '
              return -1

          #...

    self.text <=> other.text
  end
end

It works fine, but I am very frustrated coding in Ruby like I code in C++ project. That is why I would like to know how to use a "custom sort in a foreach method" with a more complexe sort way (require parsing, scanning, regexp) than a naive one based on an attribute of the content.

Upvotes: 0

Views: 208

Answers (4)

Cary Swoveland
Cary Swoveland

Reputation: 110755

[Edit: My initial solution, which follows this edit, does not work with the revised statement of the question. I will leave it, however, as it might be of interest regardless.

The following is a way to perform the sort under the revised rules, as I understand them. If I have misinterpreted the rules, I expect the fix would be minor.

Regex to use

Let's start with the regex I'll use:

R = /
    \{       # match char
    (        # begin capture group
    \d+      # match one or more digits
    (?:      # begin non-capture group
    \.       # match decimal
    \d+      # match one or more digits
    )        # end non-capture group
    |        # or
    \d*      # match zero or more digits
    )        # match end capture group
    \}       # match char
    /x

Examples:

a = ["{5}something", "{61}{64}could", "{}be", "{54}{31.24}{0.2}write",
     "{11}{21}{87}{65}here", "[]or", "{31}not", "{31} cat"]
a.each_with_object({}) { |s,h| h[s] = s.scan(R).flatten }
  # => {"{5}something"        =>["5"],
  #    "{61}{64}could"        =>["61", "64"],
  #    "{}be"                 =>[""],
  #    "{54}{31.24}{0.2}write"=>["54", "31.24", "0.2"],
  #    "{11}{21}{87}{65}here" =>["11", "21", "87", "65"],
  #    "[]or"                 =>[],
  #    "{31}not"              =>["31"]
  #    "{31} cat"             =>["31"]} 

custom_sort method

We can write the method custom_sort as follows (change sort_by to sort_by! for custom_sort!):

class Array
  def custom_sort
    sort_by do |s|
      a = s.scan(R).flatten
      raise SyntaxError,
        "'#{s}' contains empty braces" if a.any?(&:empty?)
      raise SyntaxError,
        "'#{s}' contains zero or > 3 pair of braces" if a.size.zero?||a.size > 3
      a.map(&:to_f) << s[a.join.size+2*a.size..-1].tr(' ', 255.chr)
    end
  end
end

Examples

Let's try it:

a.custom_sort
  #=> SyntaxError: '{}be' contains empty braces

Remove "{}be" from a:

a = ["{5}something", "{61}{64}could", "{54}{31.24}{0.2}write",
     "{11}{21}{87}{65}here", "[]or", "{31}not", "{31} cat"]
a.custom_sort
  #SyntaxError: '{11}{21}{87}{65}here' contains > 3 pair of braces

Remove "{11}{21}{87}{65}here":

a = ["{5}something", "{61}{64}could", "{54}{31.24}{0.2}write",
     "[]or", "{31}not", "{31} cat"]
a.custom_sort
  #=> SyntaxError: '[]or' contains zero or > 3 pair of braces

Remove "[]or":

a = ["{5}something", "{61}{64}could", "{54}{31.24}{0.2}write",
     "{31}not", "{31} cat"]
a.custom_sort
  #=> ["{5}something",
  #    "{31}not",
  #    "{31} cat",
  #    "{54}{31.24}{0.2}write", "{61}{64}could"] 

Explanation

Suppose one of the strings to be sorted was:

s = "{54}{31.24}{0.2}write a letter"

Then in the sort_by block, we would compute:

a = s.scan(R).flatten
  #=> ["54", "31.24", "0.2"]
raise SyntaxError, "..." if a.any?(&:empty?)
  #=> raise SyntaxError, "..." if false 
raise SyntaxError, "..." if a.size.zero?||a.size > 3
  #=> SyntaxError, "..." if false || false
b = a.map(&:to_f)
  #=> [54.0, 31.24, 0.2] 
t = a.join
  #=> "5431.240.2" 
n = t.size + 2*a.size
  #=> 16 
u = s[n..-1]
  #=> "wr i te" 
v = u.tr(' ', 255.chr)
  #=> "wr\xFFi\xFFte" 
b << v
  #=> [54.0, 31.24, 0.2, "wr\xFFi\xFFte"] 

Note that the use of String#tr (or you could use String#gsub) puts spaces at the end of the sort order of ASCII characters:

255.times.all? { |i| i.chr < 255.chr }
  #=> true

tidE]

I have assumed that, in sorting, pairs of strings are to be compared in a manner analogous to Array#<=>. The first comparison considers the strings of digits within the the first pair of braces in each string (after conversion to a float). Ties are broken by comparing the strings of digits in the second pairs of braces (converted to floats). If there is still a tie, the third pairs digits enclosed in braces are compared, etc. If one string has n pairs of braces and another has m > n pairs, and the values within the braces are the same for the first n pairs, I assume the first string is to precede the second in the sort.

Code

R = /
    \{    # match char
    (\d+) # capture digits
    \}    # match char
    +     # capture one or more times
    /x

class Array
  def custom_sort!
    sort_by! { |s| s.scan(R).map { |e| e.first.to_f } }
  end
end

Example

array = ["{109}{08} OK",
         "{109}{07} OK",
         "{98} Thx",
         "{108}{0.8}{908} aa",
         "{108}{0.8}{907} aa",
         "{8}{51} lorem ipsum"]

a = array.custom_sort!
  #=> ["{8}{51} lorem ipsum",
  #    "{98} Thx",
  #    "{108}{0.8}{907} aa",
  #    "{108}{0.8}{908} aa",
  #    "{109}{07} OK",
  #    "{109}{08} OK"]

array == a
  #=> true

Explanation

Let's now calculate the value in Array#sort_by!'s block for the first element of array

s = "{109}{08} OK"

a = s.scan(R)
  #=> [["109"], ["08"]] 
b = a.map { |e| e.first.to_f }
  #=> [109.0, 8.0] 

Let's now do the same for the other strings and put the results in an array:

c = array.map { |s| [s, s.scan(R).map { |e| e.first.to_f }] }
  #=> [["{8}{51} lorem ipsum", [8.0, 51.0]],
  #    ["{98} Thx",            [98.0]],
  #    ["{108}{0.8}{907} aa",  [108.0, 907.0]],
  #    ["{108}{0.8}{908} aa",  [108.0, 908.0]],
  #    ["{109}{07} OK",        [109.0, 7.0]],
  #    ["{109}{08} OK",        [109.0, 8.0]]] 

sort_by in custom_sort! is therefore equivalent to:

c.sort_by(&:last).map(&:first)
  #=> ["{8}{51} lorem ipsum",
  #    "{98} Thx",
  #    "{108}{0.8}{907} aa",
  #    "{108}{0.8}{908} aa",
  #    "{109}{07} OK",
  #    "{109}{08} OK"]

Upvotes: 1

Michael Chaney
Michael Chaney

Reputation: 3041

array.sort_by { |v| (v =~ /(\d+)/) && $1.to_i }

alternately

array.sort_by { |v| /(\d+)/.match(v)[1].to_i }

Upvotes: 1

DiegoSalazar
DiegoSalazar

Reputation: 13541

This should do it:

array.sort_by do |s| 
  # regex match the digits within the first pair of curly braces
  s.match(/^\{(\d+)\}/)[1].to_i # convert to an int in order to sort
end

# => ["{8}{51} lorem ipsum", "{98} Thx", "{108}{0.8}{908} aa", "{109}{08} OK"]

Upvotes: 1

Andrew Larson
Andrew Larson

Reputation: 483

You can pass Array#sort a block defining how it should order elements.

Upvotes: -2

Related Questions