Reputation: 715
Could someone help me to combine multiple lines in txt file into a single line if text between tags is not in a single line already?
my.txt
<start>Hello World.</start>
<start>Hello World, this is my message.
Regards,
Jane
www.url.com
</start>
desired output.txt:
<start>Hello World.</start>
<start>Hello World, this is my message. Regards, Jane www.url.com</start>
my code so far:
f = open('/path/to/my.txt', 'r')
currentline = ""
for line in f:
if line.startswith('<start>'):
line = line.rstrip('\n')
print(line)
else:
line = line.rstrip('\n')
currentline = currentline + line
print (currentline)
f.close()
output:
<start>Hello World.</start>
<start>Hello World, this is my message.
Regards,
Regards,
Regards,Jane
Regards,Jane
Regards,Janewww.url.com
Regards,Janewww.url.com
Regards,Janewww.url.com</start>
thank you in advance!
Upvotes: 0
Views: 78
Reputation: 12992
You can do something like this:
import re
with open('/path/to/my.txt', 'r') as fin:
text = fin.read()
pattern = r"(<start>(.|\n)*?</start>)"
output = []
for utter in re.findall(pattern, text, re.MULTILINE):
output.append(re.sub("\n+", ' ', utter[0]))
print(output)
#['<start>Hello World.</start>',
# '<start>Hello World, this is my message. Regards, Jane www.url.com </start>']
Upvotes: 2