Stickers
Stickers

Reputation: 78736

ValueError: invalid literal for int() with base 10: '3"\r'

My csv file (test.csv) content sample below: Note: My test.csv file is about 60MB.

"Position","Value"
"2545600","19"
"2545601","19"
"2545602","19"
"2545603","19"
"2545604","20"
"2545605","20"
"2545606","21"
"2545607","22"
"2545608","21"
"2545609","20"
"2545610","21"
"2545611","18"
"2545612","19"
"2545613","21"
"2545614","21"
"2545615","21"
"2545616","21"
"2545617","22"
"2545618","25"
"2545619","25"

My python code (test.py) below:

#!/usr/bin/python
import sys

txt = open(sys.argv[1], 'r')
out = open(sys.argv[2], 'w')
mil = float(sys.argv[3])

out.write('chr\tstart\tend\tfeature\t'+sys.argv[2]+'\n')

for line in txt:
    if 'Position' not in line:
        line = line.strip('",\n')
        line = line.split('","')

        line[1] = str(int(line[1])/mil)

        out.write('gi|255767013|ref|NC_000964.3|\t'+line[0]+'\t'+line[0]+'\t\t'+line[1]+'\n')

txt.close()
out.close()

My command line:

python test.py test.csv test.igv 5

After I run the command I got an error:

Traceback (most recent call last):
  File "test.py", line 15, in <module>
    line[1] = str(int(line[1])/mil)
ValueError: invalid literal for int() with base 10: '3"\r'

However if I create a new empty csv file i.e. small.csv and copy/paste only few lines (like the sample above) from my test.csv file. Then it runs that command successfully.

python test.py small.csv small.igv 5

Input small.csv:

"Position","Value"
"2545600","19"
"2545601","19"
"2545602","19"
"2545603","19"
"2545604","20"
"2545605","20"
"2545606","21"
"2545607","22"
"2545608","21"
"2545609","20"

Output small.igv:

chr start   end feature small.igv
gi|255767013|ref|NC_000964.3|   2545600 2545600     3.8
gi|255767013|ref|NC_000964.3|   2545601 2545601     3.8
gi|255767013|ref|NC_000964.3|   2545602 2545602     3.8
gi|255767013|ref|NC_000964.3|   2545603 2545603     3.8
gi|255767013|ref|NC_000964.3|   2545604 2545604     4.0
gi|255767013|ref|NC_000964.3|   2545605 2545605     4.0
gi|255767013|ref|NC_000964.3|   2545606 2545606     4.2
gi|255767013|ref|NC_000964.3|   2545607 2545607     4.4
gi|255767013|ref|NC_000964.3|   2545608 2545608     4.2
gi|255767013|ref|NC_000964.3|   2545609 2545609     4.0

That's all I want. So the problem, why can't I do it on a bigger size csv file?

Upvotes: 0

Views: 1550

Answers (3)

siddharthlatest
siddharthlatest

Reputation: 2257

Using the csv module is much better in this case. Each row read from the csv file is returned as a list of strings. The issue of stripping white spaces will not arise, and you can specify the delimiter (not needed here) in the argument of csv.reader function.

import csv
import sys

out = open(sys.argv[2], 'w')
mil = float(sys.argv[3])

out.write('chr\tstart\tend\tfeature\t'+sys.argv[2]+'\n')
with open(sys.argv[1], 'rb') as f:
    reader = csv.reader(f, delimiter=',')
    headers = reader.next()    # Consider headers separately
    for line in reader:
        line[1] = str(int(line[1])/mil)
        out.write('gi|255767013|ref|NC_000964.3|\t'+line[0]+'\t'+line[0]+'\t\t'+line[1]+'\n')
out.close()

python test.py test.csv test.igv 5 && cat test.igv should show the expected output.

Upvotes: 1

sotapme
sotapme

Reputation: 4903

As suggested the csv module is more helpful.

For example:

import csv
f = open("ex.csv")
for line in csv.reader(f):
    print line

and data of

"Position","Value"
"2545600","19"
"2545601","19"
"2545602","19"
"2545603","19"

gives output of

['Position', 'Value']
['2545600', '19']
['2545601', '19']
['2545602', '19']
['2545603', '19']

which is much more manageable.

Also the csv module does writing csv files as well.

Upvotes: 0

user2665694
user2665694

Reputation:

Try

for line in ..... :
     line = line.strip()

This will remove the line-endings from the line string.

Better solution: use Python's csv module dealing with such aspects for you.

Upvotes: 4

Related Questions