ANjell
ANjell

Reputation: 181

python - Return Text Between Parenthesis

I have file contains several lines of strings written as :

[(W)40(indo)25(ws )20(XP)111(, )20(with )20(the )20(fragment )20(enlar)18(ged )20(for )20(clarity )20(on )20(Fig. )] TJ

I need the text inside the parentheses only. I try to use the following code :

import re

readstream = open ("E:\\New folder\\output5.txt","r").read()

stringExtract = re.findall('\[(.*?)\]', readstream, re.DOTALL)
string = re.compile ('\(.*?\)')
stringExtract2 =  string.findall (str(stringExtract))

but some strings (or text) not exist in the output e.g, for the above string the word (with) not found in the output. Also the arrangement of strings differs from the file, e.g, for strings (enlar) and (ged ) above, the second one (ged ) appeared before (enlar), such as : ( ged other strings ..... enlar) How I can fix these problems?

Upvotes: 5

Views: 14249

Answers (4)

Phil Cooper
Phil Cooper

Reputation: 5877

findall looks like your friend here. Don't you just want:

re.findall(r'\(.*?\)',readstream)

returns:

['(W)',
 '(indo)',
 '(ws )',
 '(XP)',
 '(, )',
 '(with )',
 '(the )',
 '(fragment )',
 '(enlar)',
 '(ged )',
 '(for )',
 '(clarity )',
 '(on )',
 '(Fig. )']

Edit: as @vikramis showed, to remove the parens, use: re.findall(r'\((.*?)\)', readstream). Also, note that it is common (but not requested here) to trim trailing whitespace with something like:

re.findall(r'\((.*?) *\)', readstream)

Upvotes: 6

ekhumoro
ekhumoro

Reputation: 120598

Without regexp:

[p.split(')')[0] for p in s.split('(') if ')' in p]

Output:

['W', 'indo', 'ws ', 'XP', ', ', 'with ', 'the ', 'fragment ', 'enlar', 'ged ', 'for ', 'clarity ', 'on ', 'Fig. ']

Upvotes: 7

Joran Beasley
Joran Beasley

Reputation: 113950

your first problem is

stringExtract = re.findall('\[(.*?)\]', readstream, re.DOTALL)

I have no idea why you are doing this and im pretty sure you dont want to do this

try this instead

 readstream = "[(W)40(indo)25(ws )20(XP)111(, )20(with )20(the )20(fragment )20(enlar)18(ged )20(for )20(clarity )20(on )20(Fig. )] TJ"
 stringExtract = re.findall('\(([^)]+)\)', readstream, re.DOTALL)

which says find everything inside parenthesis that is not a closing parenthesis

Upvotes: 0

vikramls
vikramls

Reputation: 1822

Try this:

import re

readstream = open ("E:\\New folder\\output5.txt","r").read()
stringExtract2 = re.findall(r'\(([^()]+)\)', readstream)

Input:

readstream = r'[(W)40(indo)25(ws )20(XP)111(, )20(with )20(the )20(fragment )20(enlar)18(ged )20(for )20(clarity )20(on )20(Fig. )]'

Output:

['W', 'indo', 'ws ', 'XP', ', ', 'with ', 'the ', 'fragment ', 'enlar', 'ged ', 'for ', 'clarity ', 'on ', 'Fig. ']

Upvotes: 3

Related Questions