user7249622
user7249622

Reputation:

Modifying each line in a text file in Python

I have a big file like below example:

1   10161   10166   3
1   10166   10172   2
1   10172   10182   1
1   10183   10192   1
1   10193   10199   1
1   10212   10248   1
1   10260   10296   1
1   11169   11205   1
1   11336   11372   1
2   11564   11586   2
2   11586   11587   3
2   11587   11600   4
3   11600   11622   2

I would like to add a "chr" at the beginning of each line, for example:

chr1    10161   10166   3
chr1    10166   10172   2
chr1    10172   10182   1
chr1    10183   10192   1
chr1    10193   10199   1
chr1    10212   10248   1
chr1    10260   10296   1
chr1    11169   11205   1
chr1    11336   11372   1
chr2    11564   11586   2
chr2    11586   11587   3
chr2    11587   11600   4
chr3    11600   11622   2

I tried the following code in python:

   file = open("myfile.bg", "r")
   for line in file: 
      newline = "chr" + line
   out = open("outfile.bg", "w")
   for new in newline:
      out.write("n"+new)

but did not return what I wanted. do you know how to fix the code for this purpose?

Upvotes: 2

Views: 623

Answers (3)

Thecave3
Thecave3

Reputation: 797

Totally agree with @rychaza, here's my version using your code

file = open("myfile.bg", "r")
out = open("outfile.bg", "w")
for line in file:
    out.write("chr" + line)
out.close()
file.close()

Upvotes: 2

thebjorn
thebjorn

Reputation: 27360

The problem with your code is that you iterate over the input file without doing anything with the data you read:

file = open("myfile.bg", "r")
for line in file: 
    newline = "chr" + line

the last line assigns each line in myfile.bg to the newline variable (a string, with 'chr' prepended), each line overwriting the previous result.

Then you iterate over the string in newline (which will be the last line in the input file, with 'chr' prepended):

out = open("outfile.bg", "w")
for new in newline:       # <== this iterates over a string, so `new` will be individual characters
    out.write("n"+new)    # this only writes 'n' before each character in newline

If you're just doing this once, e.g. in the shell, you could use the one-liner:

open('outfile.bg', 'w').writelines(['chr' + line for line in open('myfile.bg').readlines()])

more correct (especially in a program, where you would care about open file handles etc.) would be:

with open('myfile.bg') as infp:
    lines = infp.readlines()
with open('outfile.bg', 'w') as outfp:
    outfp.writelines(['chr' + line for line in lines])

if the file is really big (close to the size of your available memory), you'll need to process it incrementally:

with open('myfile.bg') as infp:
    with open('outfile.bg', 'w') as outfp:
        for line in infp:
            outfp.write('chr' + line)

(this is much slower than the first two versions though..)

Upvotes: 1

ryachza
ryachza

Reputation: 4540

The issue is you are iterating the input and re-setting the same variable (newline) for every line, then opening a file for output and iterating newline which is a string, so new will be each character in that string.

I think something like this should be what you're looking for:

with open('myfile.bg','rb') as file:
  with open('outfile.bg','wb') as out:
    for line in file:
      out.write('chr' + line)

When iterating a file, line should already contain the trailing newline.

The with statements will automatically clean up the file handle when the block ends.

Upvotes: 0

Related Questions