NoDataDumpNoContribution
NoDataDumpNoContribution

Reputation: 10859

RegEx pattern returning all words except those in parenthesis

I have a text of the form:

können {konnte, gekonnt} Verb

And I want to get a match for all words in it that are not in parenthesis. That means:

können = 1st match, Verb = 2nd match

Unfortunately I still don't get the knock of regular expression. There is a lot of testing possibility but not much help for creation unless you want to read a book.

I will use them in Java or Python.

Upvotes: 1

Views: 62

Answers (2)

NeverHopeless
NeverHopeless

Reputation: 11233

A regex SPLIT using this pattern will do the job:

(\s+|\s*{[^}]*\}\s*)

and ignore any empty value.

Upvotes: 1

Wolph
Wolph

Reputation: 80031

In Python you could do this:

import re
regex = re.compile(r'(?:\{.*?\})?([^{}]+)', re.UNICODE)
print 'Matches: %r' % regex.findall(u'können {konnte, gekonnt} Verb')

Result:

Matches: [u'können ', u' Verb']

Although I would recommend simply replacing everything between { and } like so:

import re
regex = re.compile(r'\{.*?\}', re.UNICODE)
print 'Output string: %r' % regex.sub('', u'können {konnte, gekonnt} Verb')

Result:

Output string: u'können  Verb'

Upvotes: 1

Related Questions