ajmartin
ajmartin

Reputation: 2409

Convert sed with multiple expressions to python

I have the following sed script:

cat foo.txt | sed -e "s/.*\[\([^]]*\)\].*/\1/g" -e "s/ //g" -e "s/'//g"

Which can be translated into three expressions:

  1. Captures all text between [...]
  2. Removes white-spaces
  3. Removes all single quotes

What's a neat way to do something similar with a text file in python?

Upvotes: 3

Views: 1107

Answers (5)

perreal
perreal

Reputation: 97958

s=r"dasdad [some where, dsadasd '''' sadads] hoda"
re.sub(r'[\'\s]*', '', re.sub(r'.*\[([^]]*)\].*', r'\1', s))

Output:

somewhere,dsadasdsadads

Upvotes: 0

timss
timss

Reputation: 10260

Here comes the ugly one-liner, for those who like such things:

>>> [f for f in open("foo.txt", 'r')]
["some string ['foo'] [b a r] [baz] [] extra stuff\n"]
>>> [re.sub("[ ']", "", s) for s in re.findall("\[(.*?)\]", f) for f in open("foo.txt")]
['foo', 'bar', 'baz', '']

Explaination, best explained with reading the code backwards:

  1. Iterate through file, line by line. open() defaults to read-only.
  2. re.findall("\[(.*?)\]", f) extracts contents of [..].
  3. And finally replace whitespace and ' with nothing ("").

Upvotes: 0

Rubens
Rubens

Reputation: 14778

Another solution:

import re
regex = re.compile("\[([^\]]+)\]")

out = list()
for line in open("foo.txt", "rt"):
    out.extend(i.translate(None, "' ") for i in re.findall(regex, line.strip()))
print out

Upvotes: 1

steveha
steveha

Reputation: 76715

You could do it all with regular expressions (re.sub()) but this does it mostly with plain Python, just using regular expressions for the initial capture.

import re

s = "some string ['foo'] [b a r] [baz] [] extra stuff"

pat0 = re.compile(r'\[([^]]*)\]')

lst0 = pat0.findall(s)

lst1 = [s.replace(' ', '') for s in lst0]
lst2 = [s.replace("'", '') for s in lst1]

print(lst2) # prints: ['foo', 'bar', 'baz', '']

Upvotes: 2

mVChr
mVChr

Reputation: 50185

import re

with open('foo.txt', 'r') as f:
    read_data = f.readlines()
    out_data = []
    for line in read_data:
        out_line = re.sub(r".*\[([^]]*)\].*", r"\1", line)
        out_line = re.sub(r" ", r"", out_line)
        out_line = re.sub(r"'", r"", out_line)
        out_data.append(out_line)
    # do whatever you want with out_data here

Upvotes: 1

Related Questions