Reputation: 1267
I am attempting to optimize a couple applications by using regex.
What we are currently using is absolutely terrible, and I am somewhat limited to only using regular expressions for data manipulations.
Variable fruits
has the following value:
apple_banana_kiwi_cherry_cucumber_tomato_car_telephone
Grab everything thats between 2nd and 5th occurance of _
For instance, in the case of apple_banana_kiwi_cherry_cucumber_tomato_car_telephone
the result should be:
kiwi_cherry_cucumber
What I have right now is ^[a-zA-Z]+_[a-zA-Z]+_([a-zA-Z]+_[a-zA-Z]+_[a-zA-Z]+)_
Is this the most efficient way to extract data out of the string? Also, it is there a better way to write this statement so it is easier to read?
Upvotes: 0
Views: 28
Reputation: 18357
You can use this regex and capture the contents of group1,
(?:[^_]*_){2}((?:[^_]*_){2}[^_]*)
Explanation:
(?:[^_]*_){2}
- This part captures some text containing exactly only two underscores.((?:[^_]*_){2}[^_]*)
- This part captures some text that contains again containing some text having only exactly two underscores plus some text zero or more characters other than _
using [^_]*
and stops capturing the moment it seems fifth underscore and captures this text in group1.Henceforth, giving you all the content between second underscore and fifth underscore in group1.
Also, in case you just want first match only and not multiple matches, you can have start anchor ^
before the regex and use it like this,
^(?:[^_]*_){2}((?:[^_]*_){2}[^_]*)
Also, your regex ^[a-zA-Z]+_[a-zA-Z]+_([a-zA-Z]+_[a-zA-Z]+_[a-zA-Z]+)_
is also correct but will only allow alphabets only, hence use it if you only want to allow alphabets between underscores, else use my regex and mine is a little more compact form as it is using quantifiers. Also, my regex will help you extend in case, say tomorrow you say, I want to match all content between N
to Mth
underscore where N
and M
can be larger numbers and in that case this short regex will help.
Upvotes: 1