Reputation: 5392
I have many fill-in-the-blank sentences in strings,
e.g. "6d) We took no [pains] to hide it ."
How can I efficiently parse this string (in Python) to be
"We took no to hide it"?
I also would like to be able to store the word in brackets (e.g. "pains") in a list for use later. I think the regex module could be better than Python string operations like split().
Upvotes: 0
Views: 1172
Reputation: 1
import re
m = re.search(".*\) (.*)\[.*\] (.*)","6d) We took no [pains] to hide it .")
if m:
g = m.groups()
print g[0] + g[1]
Output :
We took no to hide it .
Upvotes: 0
Reputation: 2234
For your example you could use this regex:
(.*\))(.+)\[(.+)\](.+)
You will get four groups that you can use to create your resulting string and save the 3. group for later use:
6d)
We took no
pains
to hide it .
I used .+
here because I don't know if your strings always look like your example. You can change the .+
to alphanumeric or sth. more special to your case.
import re
s = '6d) We took no [pains] to hide it .'
m = re.search(r"(.*\))(.+)\[(.+)\](.+)", s)
print(m.group(2) + m.group(4)) # "We took no to hide it ."
print(m.group(3)) # pains
Upvotes: 0
Reputation: 113940
just for fun (to do the gather and substitution in one iteration)
matches = []
def subber(m):
matches.append(m.groups()[0])
return ""
new_text = re.sub("\[(.*?)\]",subber,s)
print new_text
print matches
Upvotes: 2
Reputation: 2567
import re
s = 'this is [test] string'
m = re.search(r"\[([A-Za-z0-9_]+)\]", s)
print m.group(1)
Output
'test'
Upvotes: 1
Reputation: 924
This will give you all the words inside the brackets.
import re
s="6d) We took no [pains] to hide it ."
matches = re.findall('\[(.*?)\]', s)
Then you can run this to remove all bracketed words.
re.sub('\[(.*?)\]', '', s)
Upvotes: 3