Reputation:
I have a big file like below example:
1 10161 10166 3
1 10166 10172 2
1 10172 10182 1
1 10183 10192 1
1 10193 10199 1
1 10212 10248 1
1 10260 10296 1
1 11169 11205 1
1 11336 11372 1
2 11564 11586 2
2 11586 11587 3
2 11587 11600 4
3 11600 11622 2
I would like to add a "chr" at the beginning of each line, for example:
chr1 10161 10166 3
chr1 10166 10172 2
chr1 10172 10182 1
chr1 10183 10192 1
chr1 10193 10199 1
chr1 10212 10248 1
chr1 10260 10296 1
chr1 11169 11205 1
chr1 11336 11372 1
chr2 11564 11586 2
chr2 11586 11587 3
chr2 11587 11600 4
chr3 11600 11622 2
I tried the following code in python:
file = open("myfile.bg", "r")
for line in file:
newline = "chr" + line
out = open("outfile.bg", "w")
for new in newline:
out.write("n"+new)
but did not return what I wanted. do you know how to fix the code for this purpose?
Upvotes: 2
Views: 623
Reputation: 797
Totally agree with @rychaza, here's my version using your code
file = open("myfile.bg", "r")
out = open("outfile.bg", "w")
for line in file:
out.write("chr" + line)
out.close()
file.close()
Upvotes: 2
Reputation: 27360
The problem with your code is that you iterate over the input file without doing anything with the data you read:
file = open("myfile.bg", "r")
for line in file:
newline = "chr" + line
the last line assigns each line in myfile.bg
to the newline
variable (a string, with 'chr'
prepended), each line overwriting the previous result.
Then you iterate over the string in newline
(which will be the last line in the input file, with 'chr'
prepended):
out = open("outfile.bg", "w")
for new in newline: # <== this iterates over a string, so `new` will be individual characters
out.write("n"+new) # this only writes 'n' before each character in newline
If you're just doing this once, e.g. in the shell, you could use the one-liner:
open('outfile.bg', 'w').writelines(['chr' + line for line in open('myfile.bg').readlines()])
more correct (especially in a program, where you would care about open file handles etc.) would be:
with open('myfile.bg') as infp:
lines = infp.readlines()
with open('outfile.bg', 'w') as outfp:
outfp.writelines(['chr' + line for line in lines])
if the file is really big (close to the size of your available memory), you'll need to process it incrementally:
with open('myfile.bg') as infp:
with open('outfile.bg', 'w') as outfp:
for line in infp:
outfp.write('chr' + line)
(this is much slower than the first two versions though..)
Upvotes: 1
Reputation: 4540
The issue is you are iterating the input and re-setting the same variable (newline
) for every line, then opening a file for output and iterating newline
which is a string, so new
will be each character in that string.
I think something like this should be what you're looking for:
with open('myfile.bg','rb') as file:
with open('outfile.bg','wb') as out:
for line in file:
out.write('chr' + line)
When iterating a file, line
should already contain the trailing newline.
The with
statements will automatically clean up the file handle when the block ends.
Upvotes: 0