Archonic
Archonic

Reputation: 5362

Detect specific format of version number using regex

I'm looking to extract elements of an array containing a version number, where a version number is either at the start or end of a string or padded by spaces, and is a series of digits and periods but does not start or end with a period. For example "10.10 Thingy" and "Thingy 10.10.5" is valid, but "Whatever 4" is not.

haystack = ["10.10 Thingy", "Thingy 10.10.5", "Whatever 4", "Whatever 4.x"]
haystack.select{ |i| i[/(?<=^| )(\d+)(\.\d+)*(?=$| )/] }
=> ["10.10 Thingy", "Thingy 10.10.5", "Whatever 4"]

I'm not sure how to modify the regex to require at least one period so that "Whatever 4" is not in the results.

Upvotes: 1

Views: 62

Answers (2)

Cary Swoveland
Cary Swoveland

Reputation: 110725

This is only a slight variant of Archonic's answer.

r = /
    (?<=\A|\s) # match the beginning of the string or a space in a positive lookbehind
    (?:\d+\.)+ # match >= 1 digits followed by a period in a non-capture group, >= 1 times 
    \d+        # match >= 1 digits
    (?=\s|\z)  # match a space or the end of the string in a positive lookahead
    /x         # free-spacing regex definition mode

haystack = ["10.10 Thingy", "Thingy 10.10.5", "Whatever 4", "Whatever 4.x"]

haystack.select { |str| str =~ r }
  #=> ["10.10 Thingy", "Thingy 10.10.5"]

The question was not to return the version information, but to to return the strings that have correct version information. As a result there is no need for the lookarounds:

r = /
    [\A\s\]    # match the beginning of the string or a space
    (?:\d+\.)+ # match >= 1 digits followed by a period in a non-capture group, >= 1 times 
    \d+        # match >= 1 digits
    [\s\z]     # match a space or the end of the string in a positive lookahead
    /x         # free-spacing regex definition mode

haystack.select { |str| str =~ r }
  #=> ["10.10 Thingy", "Thingy 10.10.5"]

Suppose one wanted to obtain both the strings that contain valid versions and the versions contained in those strings. One could write the following:

r = /
    (?<=\A|\s\) # match the beginning of string or a space in a pos lookbehind
    (?:\d+\.)+  # match >= 1 digits then a period in non-capture group, >= 1 times 
    \d+         # match >= 1 digits
    (?=\s|\z)   # match a space or end of string in a pos lookahead
    /x          # free-spacing regex definition mode

haystack.each_with_object({}) do |str,h|
  version = str[r]
  h[str] = version if version
end
  # => {"10.10 Thingy"=>"10.10", "Thingy 10.10.5"=>"10.10.5"}

Upvotes: 2

Archonic
Archonic

Reputation: 5362

Ah hah! I knew I was close.

haystack.select{ |i| i[/(?<=^| )(\d+)(\.\d+)+(?=$| )/] }

The asterisk at the end of (\.\d+)* was allowing that pattern to repeat any number of times, including zero times. You can limit that with (\.\d+){x,y} where x and y are the min and max times. You can also only identify a minimum with (\.\d+){x,}. In my case I wanted a minimum of once, which would be (\.\d+){1,}, however that's synonymous with (\.\d+)+. That only took half the day to figure out...

Upvotes: 1

Related Questions