Reputation: 473
I want to add a new line every time my program finds a regex. I want to keep the regex and only have a new line begin after it. The text is read from a .txt
file.
I am able to find the regex, but when I try to add a new line, it returns as shown below, in Actual output.
I have been trying to fix this for hours, and would be glad about help.
Here is a quick example:
In:
STLB 1234 444 text text text
STLB 8796 567 text text text
EDIT In:
STLB 1234 444text text text
STLB 8796 567text text text
Wanted Output:
STLB 1234 444
text text text
STLB 8796 567
text text text
Actual Output:
(STLB.*\d\d\d)
(STLB.*\d\d\d)
Here is my code:
stlb_match = re.compile('|'.join(['STLB.*\d\d\d']))
with open(in_file5, 'r', encoding='utf-8') as fin5, open(out_file5, 'w', encoding='utf-8') as fout5:
lines = fin5.read().splitlines()
for i, line in enumerate(lines):
matchObj1 = re.match(start_rx, line)
if not matchObj1:
first_two_word = (" ".join(line.split()[:2]))
if re.match(stlb_match,line):
line =re.sub(r'(STLB.*\d\d\d)', r'(STLB.*\d\d\d)'+' \n', line)
elif re.match(first_two_word, line):
line = line.replace(first_two_word, "\n" + first_two_word)
fout5.write(line)
Upvotes: 1
Views: 214
Reputation: 2624
Assuming the lines have always this format STLB <number> <number> <text>
, this whould work:
with open(in_file5, 'r', encoding='utf-8') as fin5, open(out_file5, 'w', encoding='utf-8') as fout5:
for l in fin5:
l = re.sub(r'(STLB\s*\d+\s*\d+)\s*', r'\1\n', l)
fout5.write(l)
fout5.write('\n')
STLB 1234 444 text text text
STLB 8796 567 text text text
STLB 1234 444
text text text
STLB 8796 567
text text text
Note the \s*
at the end of the RegEx, but the capturing group ends before so, those trailing spaces are left out.
writelines
with open(in_file5, 'r', encoding='utf-8') as fin5, open(out_file5, 'w', encoding='utf-8') as fout5:
fout5.writelines([re.sub(r'(STLB\s*\d+\s*\d+)\s*', r'\1\n', l) for l in fin5])
Let me know if this works for you
Upvotes: 3
Reputation: 91430
Your replacement part is wrong, youcannot put regex in it. Change to:
line = 'STLB 1234 444 text text text'
line = re.sub(r'(STLB.*\d\d\d)', r"\1\n", line)
print line
Output:
STLB 1234 444
text text text
Or:
line = re.sub(r'(STLB.*\d\d\d) ', r"\1\n", line)
if you want to remove the space at the beginning of the second line
Upvotes: 1