Reputation: 2409
I have the following sed script:
cat foo.txt | sed -e "s/.*\[\([^]]*\)\].*/\1/g" -e "s/ //g" -e "s/'//g"
Which can be translated into three expressions:
[...]
What's a neat way to do something similar with a text file in python?
Upvotes: 3
Views: 1107
Reputation: 97958
s=r"dasdad [some where, dsadasd '''' sadads] hoda"
re.sub(r'[\'\s]*', '', re.sub(r'.*\[([^]]*)\].*', r'\1', s))
Output:
somewhere,dsadasdsadads
Upvotes: 0
Reputation: 10260
Here comes the ugly one-liner, for those who like such things:
>>> [f for f in open("foo.txt", 'r')]
["some string ['foo'] [b a r] [baz] [] extra stuff\n"]
>>> [re.sub("[ ']", "", s) for s in re.findall("\[(.*?)\]", f) for f in open("foo.txt")]
['foo', 'bar', 'baz', '']
Explaination, best explained with reading the code backwards:
open()
defaults to read-only.re.findall("\[(.*?)\]", f)
extracts contents of [..]
.'
with nothing (""
).Upvotes: 0
Reputation: 14778
Another solution:
import re
regex = re.compile("\[([^\]]+)\]")
out = list()
for line in open("foo.txt", "rt"):
out.extend(i.translate(None, "' ") for i in re.findall(regex, line.strip()))
print out
Upvotes: 1
Reputation: 76715
You could do it all with regular expressions (re.sub()
) but this does it mostly with plain Python, just using regular expressions for the initial capture.
import re
s = "some string ['foo'] [b a r] [baz] [] extra stuff"
pat0 = re.compile(r'\[([^]]*)\]')
lst0 = pat0.findall(s)
lst1 = [s.replace(' ', '') for s in lst0]
lst2 = [s.replace("'", '') for s in lst1]
print(lst2) # prints: ['foo', 'bar', 'baz', '']
Upvotes: 2
Reputation: 50185
import re
with open('foo.txt', 'r') as f:
read_data = f.readlines()
out_data = []
for line in read_data:
out_line = re.sub(r".*\[([^]]*)\].*", r"\1", line)
out_line = re.sub(r" ", r"", out_line)
out_line = re.sub(r"'", r"", out_line)
out_data.append(out_line)
# do whatever you want with out_data here
Upvotes: 1