Reputation: 63
I have tab delimited data that I am exporting a select few columns into another file. I have:
a b c d
1 2 3 4
5 6 7 8
9 10 11 12
and I get:
b, d
b, d
2, 4
b, d
2, 4
6, 8
b, d
2, 4
6, 8
10, 12
......
I want:
b, d
2, 4
6, 8
10, 12
My code is
f=open('data.txt', 'r')
f1=open('newdata.txt','w')
t=[]
for line in f.readlines():
line =line.split('\t')
t.append('%s,%s\n' %(line[0], line[3]))
f1.writelines(t)
What am I doing wrong??? Why is it repeating?
PLease help
Thanks!!
Upvotes: 1
Views: 2946
Reputation: 82934
As already mentioned, the last line is incorrectly indented. On top of that, you are making things hard and error prone. You don't need the t
list, and you don't need to use f.readlines()
.
Another problem with your code is that your line[3]
will end with a newline (because readlines() and friends leave the newline at the end of the line), and you are adding another newline in the format '%s,%s\n'
... this would have produced double spacing on your output file, but you haven't mentioned that.
Also you say that you want b, d
in the first output line, and you say that you get b, d
-- however your code says '%s,%s\n' %(line[0], line[3])
which would produce a,d
. Note TWO differences: (1) space missing (2) a
instead of b
.
Overall: you say that you get b, d\n
but the code that you show would produce a,d\n\n
. In future, please show code and output that correspond with each other. Use copy/paste; don't type from memory.
Try this:
f = open('data.txt', 'r')
f1 = open('newdata.txt','w')
for line in f: # reading one line at a time
fields = line.rstrip('\n').split('\t')
# ... using rstrip to remove the newline.
# Re-using the name `line` as you did makes your script less clear.
f1.write('%s,%s\n' % (fields[0], fields[3]))
# Change the above line as needed to make it agree with your desired output.
f.close()
f1.close()
# Always close files when you have finished with them,
# especially files that you have written to.
Upvotes: 1
Reputation: 838336
The indentation is wrong so you are writing the entire array t on every iteration instead of only at the end. Change it to this:
t=[]
for line in f.readlines():
line = line.split('\t')
t.append('%s,%s\n' % (line[0], line[3]))
f1.writelines(t)
Alternatively you could write the lines one at a time instead of waiting until the end, then you don't need the array t
at all.
for line in f.readlines():
line = line.split('\t')
s = '%s,%s\n' % (line[0], line[3])
f1.write(s)
Upvotes: 4