Sam
Sam

Reputation: 1246

How do I capture string between certain Character and String in multi line String? Python

Let's say we have a string

string="This is a test code [asdf -wer -a2 asdf] >(ascd asdfas -were)\

 test \

(testing test) test >asdf  \

       test"

I need to get the string between character > and string "test".

I tried

re.findall(r'>[^)](.*)test',string, re.MULTILINE )

However I get

(ascd asdfas -were)\ test \ (testing test) test >asdf.

However I need:

(ascd asdfas -were)\ 

AND

asdf

How can I get those 2 string?

Upvotes: 3

Views: 111

Answers (1)

jedwards
jedwards

Reputation: 30200

What about:

import re

s="""This is a test code [asdf -wer -a2 asdf] >(ascd asdfas -were)
test
(testing test) test >asdf
test"""

print(re.findall(r'>(.*?)\btest\b', s, re.DOTALL))

Output:

['(ascd asdfas -were)\n', 'asdf\n']

The only somewhat interesting parts of this pattern are:

  • .*?, where ? makes the .* "ungreedy", otherwise you'd have a single, long match instead of two.
  • Using \btest\b as the "ending" identifier (see Jan's comment below) instead of test. Where,

    \b Matches the empty string, but only at the beginning or end of a word....

Note, it may be reading up on re.DOTALL, as I think that's really what you want. DOTALL lets . characters include newlines, while MULTILINE lets anchors (^, $) match start and end of lines instead of the entire string. Considering you don't use anchors, I'm thinking DOTALL is more appropriate.

Upvotes: 2

Related Questions