Reputation: 78736
My csv file (test.csv) content sample below: Note: My test.csv file is about 60MB.
"Position","Value"
"2545600","19"
"2545601","19"
"2545602","19"
"2545603","19"
"2545604","20"
"2545605","20"
"2545606","21"
"2545607","22"
"2545608","21"
"2545609","20"
"2545610","21"
"2545611","18"
"2545612","19"
"2545613","21"
"2545614","21"
"2545615","21"
"2545616","21"
"2545617","22"
"2545618","25"
"2545619","25"
My python code (test.py) below:
#!/usr/bin/python
import sys
txt = open(sys.argv[1], 'r')
out = open(sys.argv[2], 'w')
mil = float(sys.argv[3])
out.write('chr\tstart\tend\tfeature\t'+sys.argv[2]+'\n')
for line in txt:
if 'Position' not in line:
line = line.strip('",\n')
line = line.split('","')
line[1] = str(int(line[1])/mil)
out.write('gi|255767013|ref|NC_000964.3|\t'+line[0]+'\t'+line[0]+'\t\t'+line[1]+'\n')
txt.close()
out.close()
My command line:
python test.py test.csv test.igv 5
After I run the command I got an error:
Traceback (most recent call last):
File "test.py", line 15, in <module>
line[1] = str(int(line[1])/mil)
ValueError: invalid literal for int() with base 10: '3"\r'
However if I create a new empty csv file i.e. small.csv and copy/paste only few lines (like the sample above) from my test.csv file. Then it runs that command successfully.
python test.py small.csv small.igv 5
Input small.csv:
"Position","Value"
"2545600","19"
"2545601","19"
"2545602","19"
"2545603","19"
"2545604","20"
"2545605","20"
"2545606","21"
"2545607","22"
"2545608","21"
"2545609","20"
Output small.igv:
chr start end feature small.igv
gi|255767013|ref|NC_000964.3| 2545600 2545600 3.8
gi|255767013|ref|NC_000964.3| 2545601 2545601 3.8
gi|255767013|ref|NC_000964.3| 2545602 2545602 3.8
gi|255767013|ref|NC_000964.3| 2545603 2545603 3.8
gi|255767013|ref|NC_000964.3| 2545604 2545604 4.0
gi|255767013|ref|NC_000964.3| 2545605 2545605 4.0
gi|255767013|ref|NC_000964.3| 2545606 2545606 4.2
gi|255767013|ref|NC_000964.3| 2545607 2545607 4.4
gi|255767013|ref|NC_000964.3| 2545608 2545608 4.2
gi|255767013|ref|NC_000964.3| 2545609 2545609 4.0
That's all I want. So the problem, why can't I do it on a bigger size csv file?
Upvotes: 0
Views: 1550
Reputation: 2257
Using the csv module is much better in this case. Each row read from the csv file is returned as a list of strings. The issue of stripping white spaces will not arise, and you can specify the delimiter (not needed here) in the argument of csv.reader
function.
import csv
import sys
out = open(sys.argv[2], 'w')
mil = float(sys.argv[3])
out.write('chr\tstart\tend\tfeature\t'+sys.argv[2]+'\n')
with open(sys.argv[1], 'rb') as f:
reader = csv.reader(f, delimiter=',')
headers = reader.next() # Consider headers separately
for line in reader:
line[1] = str(int(line[1])/mil)
out.write('gi|255767013|ref|NC_000964.3|\t'+line[0]+'\t'+line[0]+'\t\t'+line[1]+'\n')
out.close()
python test.py test.csv test.igv 5 && cat test.igv
should show the expected output.
Upvotes: 1
Reputation: 4903
As suggested the csv module is more helpful.
For example:
import csv
f = open("ex.csv")
for line in csv.reader(f):
print line
and data of
"Position","Value"
"2545600","19"
"2545601","19"
"2545602","19"
"2545603","19"
gives output of
['Position', 'Value']
['2545600', '19']
['2545601', '19']
['2545602', '19']
['2545603', '19']
which is much more manageable.
Also the csv module does writing csv files as well.
Upvotes: 0
Reputation:
Try
for line in ..... :
line = line.strip()
This will remove the line-endings from the line string.
Better solution: use Python's csv module dealing with such aspects for you.
Upvotes: 4