Y_S.
Y_S.

Reputation: 13

f.readline() doesn't capture the last line of the file

I am reading from a very large text file using readline(). The file is several million lines in length. However, whatever I do doesn't capture the last line of the file.

The file I am reading looks like this:

$ tail file.txt
22  rs1193135566    0   50807787    C   G   0   0   0   0   NA  0   0   0   NA  NA  0
22  rs1349597430    0   50807793    T   G   0   0   0   0   0   0   0   NA  NA  NA  NA
22  rs1230501076    0   50807799    T   G   0   0   NA  NA  0   0   0   NA  0   NA  0
22  22_50807803 0   50807803    C   G   0   0   0   0   0   0   0   0   0   NA  0
22  rs1488400844    0   50807810    G   T   0   0   0   NA  0   0   0   0   0   NA  0
22  rs1279244475    0   50807811    G   T   0   0   0   NA  0   0   0   0   0   NA  0
22  rs1346432135    0   50807812    G   A   0   NA  0   0   0   0   0   0   0   NA  0
22  rs1340490361    0   50807813    C   G   0   0   0   NA  0   0   0   0   0   NA  0
22  22_50807816 0   50807816    G   T   0   0   0   NA  0   0   0   0   0   NA  0
22  rs1412997563    0   50807818    G   C   0   0   0   NA  0   0   0   0   0   NA  0

And my code looks like this:

with open('/path/file.txt', 'r') as f:

  for l in f:
      line = l.rstrip('\n').split("\t")
      print(line)

The last line of the file comes out empty [].

The output looks like this:

['22', 'rs1250150067', '0', '50807769', 'G', 'A', 'NA', '0', '0', '0', '0', '0', '0', '0', '0', 'NA', '0']
['22', 'rs1193135566', '0', '50807787', 'C', 'G', '0', '0', '0', '0', 'NA', '0', '0', '0', 'NA', 'NA', '0']
['22', 'rs1230501076', '0', '50807799', 'T', 'G', '0', '0', 'NA', 'NA', '0', '0', '0', 'NA', '0', 'NA', '0']
['22', 'rs1488400844', '0', '50807810', 'G', 'T', '0', '0', '0', 'NA', '0', '0', '0', '0', '0', 'NA', '0']
['22', 'rs1346432135', '0', '50807812', 'G', 'A', '0', 'NA', '0', '0', '0', '0', '0', '0', '0', 'NA', '0']
['22', '22_50807816', '0', '50807816', 'G', 'T', '0', '0', '0', 'NA', '0', '0', '0', '0', '0', 'NA', '0']
['']

Upvotes: 1

Views: 1667

Answers (3)

Prajyot Naik
Prajyot Naik

Reputation: 86

You are reading only one line, try using f.readlines() instead, which will read all the lines. If you wish to use line by line then use subscripting.

lines = f.readlines()
print(lines[0]) # to display 1st line
print(lines[1]) # to display 2nd line

And so on. You can also print lines in loop, after reading, like

lines = f.readlines()
for line in lines:
    print(line)

Edit 1: It appears in the output you have provided like your loop is not reading all lines, since only second, fourth, sixth lines from end are visible in output.

Also try using strip() instead of rstrip('\n') since this will strip any white space around your string on both sides.

Upvotes: 1

tripleee
tripleee

Reputation: 189457

You are discarding every other line.

for line in f already reads a line into line. You then discard that and fetch another line with line = f.readline(). My Python 3.5.1 actually warns and aborts:

ValueError: Mixing iteration and read methods would lose data

You can read all the lines into memory at once, or process one at a time. I generally recommend the latter unless your processing needs to have all the data in memory in the end (and even then you probably need to parse it into a sane structure, so keeping the raw data in memory is just wasteful).

with open('/path/file.txt', 'r') as f:
    for line in f:
        print(line.rstrip('\n').split('\t'))   # or process line

Upvotes: 0

Niaz Palak
Niaz Palak

Reputation: 327

I think you are looking for something like this:

    with open('/path/file.txt', 'r') as f:
        for lines in f.readlines():
            line = lines.rstrip('\n').split("\t")
            print(line)

Upvotes: 0

Related Questions