Reputation: 107
New to coding so sorry if this is a silly question.
I have some text that I'm attempting to format to make it more pleasant to read, so I tried my hand at writing a short program in Python to do it for me. I initially removed extra paragraph breaks in MS-Word using the find-and-replace option. The input text looks something like this:
This is a sentence. So is this one. And this.
(empty line)
This is the next line
(empty line)
and some lines are like this.
I want to eliminate all empty lines, so that there is no spacing between lines, and ensure no sentences are left hanging mid-way like in the bit above. All new lines should begin with 2 (two) empty spaces, represented by the $
symbol below. So after formatting it should look something like this:
$$This is a sentence. So is this one. And this.
$$This is the next line and some lines are like this.
I wrote the following script:
import os
directory = "C:/Users/DELL/Desktop/"
filename = "test.txt"
path = os.path.join(directory, filename)
with open(path,"r") as f_in, open(directory+"output.txt","w+") as f_out:
temp = " "
for line in f_in:
curr_line = line.strip()
temp += curr_line
#print("Current line:\n%s\n\ntemp line: %s" % (curr_line, temp))
if curr_line:
if temp[-1]==".": #check if sentence is complete
f_out.write(temp)
temp = "\n " #two blank spaces here
It eliminates all blank lines, indents new lines by two spaces, and conjoins hanging sentences, but doesn't insert the necessary blank space - so the output currently looks like (missing space between the words line
and and
).
$$This is a sentence. So is this one. And this.
$$This is the next lineand some lines are like this.
I tried to fix this by changing the following lines of code to read as follows:
temp += " " + curr_line
temp = "\n " #one space instead of two
and that doesn't work, and I'm not entirely sure why. It might be an issue with the text itself but I'll check on that.
Any advice would be appreciated, and if there is a better way to do what I want than this convoluted mess that I wrote, then I would like to know that as well.
EDIT: I seem to have fixed it. In my text (very long so I didn't notice it at first) there were two lines separated by 2 (two) empty lines, and so my attempt at fixing it didn't work. I moved one line a bit further below to give the following loop, which seems to have fixed it:
for line in f_in:
curr_line = line.strip()
#print("Current line:\n%s\n\ntemp line: %s" % (curr_line, temp))
if curr_line:
temp += " " + curr_line
if temp[-1]==".": #check if sentence is complete
f_out.write(temp)
temp = "\n "
I also saw that an answer below initially had a bit of Regex in it, I'll have to learn that at some point in the future I suppose.
Thanks for the help everyone.
Upvotes: 0
Views: 104
Reputation: 12157
This should work. It's effectively the same as yours but a bit more efficient. Doesn't use string concatenation +
+=
(which are slow) but instead saves incomplete lines as a list. It then writes 2 spaces, each incomplete sentence joined by spaces and then a newline -- this simplifies it by only writing when a line is complete.
temp = []
with open(path_in, "r") as f_in, open(path_out, "w") as f_out:
for line in f_in:
curr_line = line.strip()
if curr_line:
temp.append(curr_line)
if curr_line.endswith('.'): # write our line
f_out.write(' ')
f_out.write(' '.join(temp))
f_out.write('\n')
temp.clear() # reset temp
outputs
This is a sentence. So is this one. And this.
This is the next line and some lines are like this.
Upvotes: 1