Baobab1988
Baobab1988

Reputation: 715

Python to combine multiple lines in a txt file if certain criteria match

Could someone help me to combine multiple lines in txt file into a single line if text between tags is not in a single line already?

my.txt

<start>Hello World.</start>
<start>Hello World, this is my message.


Regards,

Jane

www.url.com

</start>

desired output.txt:

<start>Hello World.</start>
<start>Hello World, this is my message. Regards, Jane www.url.com</start>

my code so far:

f = open('/path/to/my.txt', 'r')
currentline = ""
for line in f:
    if line.startswith('<start>'):
        line = line.rstrip('\n')
        print(line)
    else:
        line = line.rstrip('\n')
        currentline = currentline + line
        print (currentline)

f.close()

output:

<start>Hello World.</start>
<start>Hello World, this is my message.


Regards,
Regards,
Regards,Jane
Regards,Jane
Regards,Janewww.url.com
Regards,Janewww.url.com
Regards,Janewww.url.com</start>

thank you in advance!

Upvotes: 0

Views: 78

Answers (1)

Anwarvic
Anwarvic

Reputation: 12992

You can do something like this:

import re

with open('/path/to/my.txt', 'r') as fin:
    text = fin.read()

pattern = r"(<start>(.|\n)*?</start>)"
output = []
for utter in re.findall(pattern, text, re.MULTILINE):
    output.append(re.sub("\n+", ' ', utter[0]))
print(output)
#['<start>Hello World.</start>',
# '<start>Hello World, this is my message. Regards, Jane www.url.com </start>']

Upvotes: 2

Related Questions