hgv
hgv

Reputation: 237

Python readline with custom delimiter

novice here. I am trying to read lines from a file, however a single line in a .txt file has a \n in the middle somewhere and while trying to read that line with .readline python cuts it in the middle and outputs as two lines.

.

f= open("f.txt",mode='r',encoding='utf8')

for i in range(4):
    lineText=f.readline()
    print(lineText)

f.close()

enter image description here

Upvotes: 16

Views: 36315

Answers (2)

Serge Ballesta
Serge Ballesta

Reputation: 149155

Python 3 allows you to define what is the newline for a particular file. It is seldom used, because the default universal newlines mode is very tolerant:

When reading input from the stream, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller.

So here you should made explicit that only '\r\n' is an end of line:

f= open("f.txt",mode='r',encoding='utf8', newline='\r\n')

# use enumerate to show that second line is read as a whole
for i, line in enumerate(fd):   
    print(i, line)

Upvotes: 18

Roomm
Roomm

Reputation: 924

Instead of using readline function, you can read whole content and split lines by regex:

import re

with open("txt", "r") as f:
    content = f.read()
    # remove end line characters
    content = content.replace("\n", "")
    # split by lines
    lines = re.compile("(\[[0-9//, :\]]+)").split(content)
    # clean "" elements
    lines = [x for x in lines if x != ""]
# join by pairs
lines = [i + j for i, j in zip(lines[::2], lines[1::2])]

If all content has the same beginning [...] you can split by this, then clean all parts omitting the "" elements. Then you can join each part with zip function (https://stackoverflow.com/a/5851033/1038301)

Upvotes: 1

Related Questions