Daichi
Daichi

Reputation: 309

Is there a way to use regex to find the substring between two strings?

I know it may seem like this question has already been asked, but I've tried searching and using the other answers for my example but for some reason, I can't seem to get it working.

I have the text:

    ['root(ROOT-0, love-2) s1', 'amod(perve-5, good-4) s2',
    'advmod(love-2, thanks-12) s3', 'amod(mags-16, glossy-15) s4']

And I only want the text in between amod( up until the -. for example, I want:

'perve' and 'mags'

I've tried:

words = re.findall('\((.*?)\-', v)

but it returns:

['ROOT', 'perve', 'love', 'mags']

Any suggestions would be greatly appreciated.

Upvotes: 0

Views: 72

Answers (3)

J.Doe
J.Doe

Reputation: 434

When I want to find an arbitrary substring between two known substrings, I usually rely on a combination of a lookahead and lookbehind assertion.

for string in List:
    match = re.search(r'(?<=amod\()[^-]+(?=-)',string).group()
    print(match)

Note, that you have to use [^-] (everything except minus), because of the lookbehind assertion (?=-). You can't use your greedy .+ and then expect the regex to stop matching at your lookbehind, if your lookbehind (-) is also in the greedy match (.+)

Hope this is what you wanted.

Upvotes: 0

anubhava
anubhava

Reputation: 785128

You may use:

>>> test_str = ("    ['root(ROOT-0, love-2) s1', 'amod(perve-5, good-4) s2',\n"
...     "    'advmod(love-2, thanks-12) s3', 'amod(mags-16, glossy-15) s4']")
>>>
>>> print ( re.findall(r"amod\(([^-]*)-", test_str) )
['perve', 'mags']

RegEx Demo

RegEx Details:

  • amod: Match literal text amid(
  • ([^-]*): Match 0 or more of any characters that are not - and capture it in group #1
  • -: Match a literal -

Upvotes: 2

Tom Girou
Tom Girou

Reputation: 39

This seems to do the trick as regex :

(?<=amod\().+?(?=-)

Regex demo

Upvotes: 0

Related Questions