Reputation: 5362
I'm looking to extract elements of an array containing a version number, where a version number is either at the start or end of a string or padded by spaces, and is a series of digits and periods but does not start or end with a period. For example "10.10 Thingy" and "Thingy 10.10.5" is valid, but "Whatever 4" is not.
haystack = ["10.10 Thingy", "Thingy 10.10.5", "Whatever 4", "Whatever 4.x"]
haystack.select{ |i| i[/(?<=^| )(\d+)(\.\d+)*(?=$| )/] }
=> ["10.10 Thingy", "Thingy 10.10.5", "Whatever 4"]
I'm not sure how to modify the regex to require at least one period so that "Whatever 4" is not in the results.
Upvotes: 1
Views: 62
Reputation: 110725
This is only a slight variant of Archonic's answer.
r = /
(?<=\A|\s) # match the beginning of the string or a space in a positive lookbehind
(?:\d+\.)+ # match >= 1 digits followed by a period in a non-capture group, >= 1 times
\d+ # match >= 1 digits
(?=\s|\z) # match a space or the end of the string in a positive lookahead
/x # free-spacing regex definition mode
haystack = ["10.10 Thingy", "Thingy 10.10.5", "Whatever 4", "Whatever 4.x"]
haystack.select { |str| str =~ r }
#=> ["10.10 Thingy", "Thingy 10.10.5"]
The question was not to return the version information, but to to return the strings that have correct version information. As a result there is no need for the lookarounds:
r = /
[\A\s\] # match the beginning of the string or a space
(?:\d+\.)+ # match >= 1 digits followed by a period in a non-capture group, >= 1 times
\d+ # match >= 1 digits
[\s\z] # match a space or the end of the string in a positive lookahead
/x # free-spacing regex definition mode
haystack.select { |str| str =~ r }
#=> ["10.10 Thingy", "Thingy 10.10.5"]
Suppose one wanted to obtain both the strings that contain valid versions and the versions contained in those strings. One could write the following:
r = /
(?<=\A|\s\) # match the beginning of string or a space in a pos lookbehind
(?:\d+\.)+ # match >= 1 digits then a period in non-capture group, >= 1 times
\d+ # match >= 1 digits
(?=\s|\z) # match a space or end of string in a pos lookahead
/x # free-spacing regex definition mode
haystack.each_with_object({}) do |str,h|
version = str[r]
h[str] = version if version
end
# => {"10.10 Thingy"=>"10.10", "Thingy 10.10.5"=>"10.10.5"}
Upvotes: 2
Reputation: 5362
Ah hah! I knew I was close.
haystack.select{ |i| i[/(?<=^| )(\d+)(\.\d+)+(?=$| )/] }
The asterisk at the end of (\.\d+)*
was allowing that pattern to repeat any number of times, including zero times. You can limit that with (\.\d+){x,y}
where x and y are the min and max times. You can also only identify a minimum with (\.\d+){x,}
. In my case I wanted a minimum of once, which would be (\.\d+){1,}
, however that's synonymous with (\.\d+)+
. That only took half the day to figure out...
Upvotes: 1