0_o
0_o

Reputation: 590

Regex to match all words inside parenthesis

Imagine this is a part of a large text:

stuff (word1/Word2/w0rd3) stuff, stuff (word4/word5) stuff/stuff (word6) stuff (word7/word8/word9) stuff / stuff, (w0rd10/word11) stuff stuff (word12) stuff (Word13/w0rd14/word15) stuff-stuff stuff (word16/word17).

I want the words. The result must matches:

word1
Word2
w0rd3
word4
word5
word6
word7
word8
word9
w0rd10
word11
word12
Word13
w0rd14
word15
word16
word17

Also the result should not be like:

(word1) or (word1/Word2/w0rd3) 

Basically no ( or ) or / allowed

What i have tried:

\((\w+)\/(\w+)\/(\w+)\)[^(]*\((\w+)\/(\w+)\)[^(]*\((\w+)\) 

regex101

This matches those words but i have to duplicate it as many word exist which is not clean. Also i tried txt2re but it is duplicated as well and it is not a one line regex. In case i want to use it on a online regex evaluator and no coding is in access then i need a one line and short regex. And my preferred engine is Python and C#.


Update: I have added some / in the text. Also sorry for changing the accepted answer, All answers are correct in some way, But i have to choose the fastest and most efficient regex here.

Upvotes: 5

Views: 289

Answers (4)

bobble bubble
bobble bubble

Reputation: 18490

A common solution is to check, if there is a closing ) ahead without any opening ( in between.

\w+\b(?=[^)(]*\))

See this demo at regex101

So this pattern does not check for an opening ( before, but often that's not needed.

Upvotes: 2

The fourth bird
The fourth bird

Reputation: 163277

You could use a capturing group which will be returned by re.findall and match all between the parenthesis with a forward slash as a delimiter.

Then in the result you could split on a forward slash:

\((\w+(?:/\w+)*)\)

Explanation

  • \( Match opening parenthesis
  • ( Capturing group
    • \w+ Match 1+ word chars
    • (?:/\w+)* Match 0+ times a / and 1+ word chars
  • ) Close capturing group
  • \) Match closing parenthesis

Regex demo | Python demo

If you want to match more than word characters you might use a negated character class [^()/]+ matching not parenthesis or a forward slash:

\(([^()/]+(?:/[^()/]+)*)\)

Regex demo

For example:

import re

regex = r"\(([^()/]+(?:/[^()/]+)*)\)"
test_str = "stuff (word1/Word2/w0rd3) stuff, stuff (word4/word5) stuff stuff (word6) stuff (word7/word8/word9) stuff stuff, (w0rd10/word11) stuff stuff (word12) stuff (Word13/w0rd14/word15) stuff-stuff stuff (word16/word17)."
res = list(map(lambda x: x.split('/'), re.findall(regex, test_str)))

Or see the flattened version.

Upvotes: 2

Vishnudev Krishnadas
Vishnudev Krishnadas

Reputation: 10960

Use findall with look-behind assertion

(?<=[(/])\w+

LINK TO REGEX

>>> re.findall(r'(?<=[(/])\w+', input_string)
['word1', 'Word2', 'w0rd3', 'word4', 'word5', 'word6', 'word7', 'word8', 'word9', 'w0rd10', 'word11', 'word12', 'Word13', 'w0rd14', 'word15', 'word16', 'word17']

Explaination

(?<=[(/])\w+

Positive Lookbehind (?<=[(/])

  • Assert that the Regex below matches
  • Match a single character present in the list [(/]
    • ( or / matches a single character
  • \w+ matches any word character (equal to [a-zA-Z0-9_])
    • + Quantifier - Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)

Upvotes: 1

Sweeper
Sweeper

Reputation: 271050

Instead of matching the words, you can write a regex that matches the non-words, and split by the regex:

\)?[^)]+?\(|\).+|/

A non-word is either:

  • an optional close parenthesis followed by a bunch of characters that are not close parentheses, followed by an opening parenthesis.
  • a closing parenthesis followed by some text (this is used to match the last bit of the string)
  • a slash

Regex Demo

Upvotes: 2

Related Questions