Lindsay
Lindsay

Reputation: 25

Removing line breaks in Python output

I am cleaning a text file and have written the following code to remove unwanted characters. My issue is that the final output appears as a list of words, when I want it composed as a text. I think the issue is in this line which is intended to remove line breaks by replacing new line i.e. "(\n)" with ""

Step4 = re.sub(r"(\n)"," ",Step3)
        print(Step4)

Full code as follows:

f=open("/Applications/Python 3.9/cleaning text.txt",encoding='Latin-1')
raw=f.read()
#print(raw)
import re
import nltk
from nltk import word_tokenize
Data = re.split(r" ",raw)
for D in Data:
#    print(str(raw)+'\n')
    Step1 = re.sub(r"(\\.*)","",D)
#    print(Step1)
    Step2 = re.sub(r"(M)","hl",Step1)
#    print(Step2)
    Step3 = re.sub(r"(\[aa\])","[a::]",Step2)
#    print(Step3)
    Step4 = re.sub(r"(\n)"," ",Step3)
    print(Step4)

Upvotes: 0

Views: 502

Answers (1)

ultdevchar
ultdevchar

Reputation: 150

I think you don't need split whole text into list word by word. You can give raw data as an input to re.sub() function. If you want to remove space character from beginning or ending of raw data, you can use strip() function for this.

f=open("/Applications/Python 3.9/cleaning text.txt",encoding='Latin-1')
raw=f.read()
import re

raw = str(raw).strip()
Step1 = re.sub(r"(\\.*)","",raw)
Step2 = re.sub(r"(M)","hl",Step1)
Step3 = re.sub(r"(\[aa\])","[a::]",Step2)
Step4 = re.sub(r"(\n)"," ",Step3)

Upvotes: 1

Related Questions