Dimi
Dimi

Reputation: 1267

Using Regex to combine delimited data

I am attempting to optimize a couple applications by using regex.

What we are currently using is absolutely terrible, and I am somewhat limited to only using regular expressions for data manipulations.

Variable fruits has the following value: apple_banana_kiwi_cherry_cucumber_tomato_car_telephone

Grab everything thats between 2nd and 5th occurance of _

For instance, in the case of apple_banana_kiwi_cherry_cucumber_tomato_car_telephone

the result should be:

kiwi_cherry_cucumber

What I have right now is ^[a-zA-Z]+_[a-zA-Z]+_([a-zA-Z]+_[a-zA-Z]+_[a-zA-Z]+)_

Is this the most efficient way to extract data out of the string? Also, it is there a better way to write this statement so it is easier to read?

Upvotes: 0

Views: 28

Answers (1)

Pushpesh Kumar Rajwanshi
Pushpesh Kumar Rajwanshi

Reputation: 18357

You can use this regex and capture the contents of group1,

(?:[^_]*_){2}((?:[^_]*_){2}[^_]*)

Demo

Explanation:

  • (?:[^_]*_){2} - This part captures some text containing exactly only two underscores.
  • ((?:[^_]*_){2}[^_]*) - This part captures some text that contains again containing some text having only exactly two underscores plus some text zero or more characters other than _ using [^_]* and stops capturing the moment it seems fifth underscore and captures this text in group1.

Henceforth, giving you all the content between second underscore and fifth underscore in group1.

Also, in case you just want first match only and not multiple matches, you can have start anchor ^ before the regex and use it like this,

^(?:[^_]*_){2}((?:[^_]*_){2}[^_]*)

Demo with first match only

Also, your regex ^[a-zA-Z]+_[a-zA-Z]+_([a-zA-Z]+_[a-zA-Z]+_[a-zA-Z]+)_ is also correct but will only allow alphabets only, hence use it if you only want to allow alphabets between underscores, else use my regex and mine is a little more compact form as it is using quantifiers. Also, my regex will help you extend in case, say tomorrow you say, I want to match all content between N to Mth underscore where N and M can be larger numbers and in that case this short regex will help.

Upvotes: 1

Related Questions