Reputation: 4517
I have a string formatted like:
project-version-project_test-type-other_info-other_info.file_type
I can strip most of the information I need out of this string in most cases. My trouble arises when my version has an extra qualifying character in it (i.e. normally 5 characters but sometimes a 6th is added).
Previously, I was using substrings to remove the excess information and get the 'project_test-type' however, now I need to switch to a regex (mostly to handle that extra version character). I could keep using substrings and change the length depending on whether I have that extra version character or not but a regex seems more appropriate here.
I tried using patterns like:
my ($type) = $_ =~ /.*-.*-(.*)-.*/;
But the extra '-' in the 'project_test-type' means I can't simply space my regex using that character.
What regex can I use to get the 'project_test-type' out of my string?
More information: As a more human readable example, the information is grouped in the following way:
project - version - project_test-type - other_info - other_info . file_type
Upvotes: 1
Views: 128
Reputation: 118605
Greedy/non-greedy approach
($type) = /.*?-.*?-(.*)-.*-.*/;
.*?
is a non-greedy match, meaning match any number of any character, but no more than necessary to match the regular expression. Using .*
between the second and third dashes is a greedy match, matching as many characters as possible while still matching the regular expression, and using this will capture words with any extra dashes in them.
Upvotes: 0
Reputation: 385655
Since no field other than the desired one can contain -
, any extra -
belongs to the desired field.
+--------------------------- project
| +--------------------- version
| | +----------------- project_test-type
| | | +---------- other_info
| | | | +---- other_info.file_type
| | | | |
____| ____| _| ____| ____|
/^[^-]*-[^-]*-(.*)-[^-]*-[^-]*\z/
[^-]
matches a character that's not a -
.
[^-]*
matches zero or more characters that's aren't -
.
Upvotes: 5
Reputation: 8743
To match everything:
/^([^-]+)-([^-]+)-(.+)-([^-]+)-([^-]+)\.([a-zA-Z0-9]+)$/
[]
defines character sets and ^
at the beginning of a set means "NOT". Also a -
in a set usually means a range, unless it is at the beginning or end. So [^-]+
consumes as many non-dash characters as possible (at least one).
Upvotes: 1
Reputation: 13640
You can use
/\w+\s*-\s*\d{5}[a-zA-Z]?\s*-\s*(.*?)(?=\s*-\s*\d)/
Explanation:
\w+\s*-
==> match character sequence followed by any number of spaces and a -
\d{5}[a-zA-Z]?
==> always 5 digits with one or zero character(.*?)
=> match everything in a non greedy way(?=\s*-\s*\d)
=> look forward for a digit and stop (since IP starts with a digit)Upvotes: 0