Mady
Mady

Reputation: 473

Add a new line after a regex

I want to add a new line every time my program finds a regex. I want to keep the regex and only have a new line begin after it. The text is read from a .txt file. I am able to find the regex, but when I try to add a new line, it returns as shown below, in Actual output. I have been trying to fix this for hours, and would be glad about help.

Here is a quick example:

In:

STLB 1234 444 text text text
STLB 8796 567 text text text

EDIT In:

STLB 1234 444text text text

STLB 8796 567text text text

Wanted Output:

STLB 1234 444
text text text
STLB 8796 567
text text text

Actual Output:

(STLB.*\d\d\d) 

(STLB.*\d\d\d) 

Here is my code:

stlb_match = re.compile('|'.join(['STLB.*\d\d\d']))

with open(in_file5, 'r', encoding='utf-8') as fin5, open(out_file5, 'w', encoding='utf-8') as fout5:
    lines = fin5.read().splitlines()

    for i, line in enumerate(lines):
        matchObj1 = re.match(start_rx, line)

        if not matchObj1:
            first_two_word = (" ".join(line.split()[:2]))

            if re.match(stlb_match,line):
                line =re.sub(r'(STLB.*\d\d\d)', r'(STLB.*\d\d\d)'+' \n', line)
            elif re.match(first_two_word, line):
                line = line.replace(first_two_word, "\n" + first_two_word)

        fout5.write(line)

Upvotes: 1

Views: 214

Answers (2)

JoshuaCS
JoshuaCS

Reputation: 2624

Assuming the lines have always this format STLB <number> <number> <text>, this whould work:

Code

with open(in_file5, 'r', encoding='utf-8') as fin5, open(out_file5, 'w', encoding='utf-8') as fout5:
    for l in fin5:
      l = re.sub(r'(STLB\s*\d+\s*\d+)\s*', r'\1\n', l)

      fout5.write(l)
      fout5.write('\n')

Input

STLB 1234 444 text text text
STLB 8796 567 text text text

Output

STLB 1234 444
text text text

STLB 8796 567
text text text

Note the \s* at the end of the RegEx, but the capturing group ends before so, those trailing spaces are left out.

Using list comprehension and writelines

with open(in_file5, 'r', encoding='utf-8') as fin5, open(out_file5, 'w', encoding='utf-8') as fout5:
    fout5.writelines([re.sub(r'(STLB\s*\d+\s*\d+)\s*', r'\1\n', l) for l in fin5])

Let me know if this works for you

Upvotes: 3

Toto
Toto

Reputation: 91430

Your replacement part is wrong, youcannot put regex in it. Change to:

line = 'STLB 1234 444 text text text'
line = re.sub(r'(STLB.*\d\d\d)', r"\1\n", line)
print line

Output:

STLB 1234 444
 text text text

Or:

line = re.sub(r'(STLB.*\d\d\d) ', r"\1\n", line)

if you want to remove the space at the beginning of the second line

Upvotes: 1

Related Questions