Marc
Marc

Reputation: 1185

Looking for a regex that match all words, except the ones [inside brackets]

I'm trying to write a regular expression that matches all word inside a specific string, but skips words inside brackets. I currently have one regex that matches all words:

/[a-z0-9]+(-[a-z0-9]+)*/i

I also have a regex that matches all words inside brackets:

/\[(.*)\]/i

I basically want to match everything that the first regex matches, but without everything the second regex matches.

Sample input text: http://gist.github.com/222857 It should match every word separately, without the one in the brackets.

Any help is appreciated. Thanks!

Upvotes: 4

Views: 1774

Answers (6)

AleB
AleB

Reputation: 82

This seems to work:

[^\[][a-z0-9]+(-[a-z0-9]+)*

if the first letter of a word is an opening bracket, it doesnt match it.

btw, is there a reason why you are capturing the words with dashes in them? If no need for that, your regex could be simplified.

Upvotes: -1

Alan Moore
Alan Moore

Reputation: 75232

Which Ruby version are you using? If it's 1.9 or later, this should do what you want:

/(?<![\[a-z0-9-])[a-z0-9]+(-[a-z0-9]+)*(?![\]a-z0-9-])/i

Upvotes: 1

glenn mcdonald
glenn mcdonald

Reputation: 15488

How 'bout this:

your_text.scan(/\[.*\]|([a-z0-9]+(?:-[a-z0-9]+)*)/i) - [[nil]]

Upvotes: 1

cgr
cgr

Reputation: 1121

I agree with Shhnap. Without more info, it sounds like the easiest way is to remove what you don't want. but it needs to be /[(.*?)]/ instead. After that you can split on \s.

If you are trying to iterate through each word, and you want each word to match maybe you can cheat a little with: string.split(/\W+/) .You will lose the quotations and what not, but you get each word.

Upvotes: 0

Robert Massaioli
Robert Massaioli

Reputation: 13477

I don't think I understand the question properly. Why not just make a new string that does not contain the second regex like so:

string1 =~ s/\[(.*)\]//g

Off the top of my head won't that match what you deleted while storing the result in string1? I have not tested this yet though. I might test it later.

Upvotes: 0

Greg Hewgill
Greg Hewgill

Reputation: 993125

Perhaps you could do it in two steps:

  1. Remove all the text within brackets.
  2. Use a regular expression to match the remaining words.

Using a single regular expression to try to do both these things will end up being more complicated than it needs to be.

Upvotes: 3

Related Questions