codious
codious

Reputation: 3511

CSV manipulation

I have the following csv file:

hindex
1
2
2
6
3
3
3
2
2

I am trying to read the row and check its value but it gives the following error:

ValueError: invalid literal for int() with base 10: 'hindex'

The code is:

cr = csv.reader(open('C:\\Users\\chatterjees\\Desktop\\data\\topic_hindex.csv', "rb"))
for row in cr:
    x=row[0]
    if(int(x)<=10):
        print x

what's wrong in my code?

Upvotes: 0

Views: 745

Answers (7)

chfw
chfw

Reputation: 4592

Just one more alternative here. I wrote a wrapper library which could handle this task at ease too. Suppose you have saved the data in a file named "topic_hindex.csv" in the directory where the following script is.

import pyexcel


r = pyexcel.SeriesReader("topic_hindex.csv")
for row in r.rows():
    x = row[0]
    if x <= 10:
        print x

Or alternatively, you can use a filter:

import pyexcel


r = pyexcel.SeriesReader("topic_hindex.csv")
eval_func = lambda row: row[0] <= 10
r.filter(pyexcel.RowValueFilter(eval_func))
for row in r.rows():
    print row[0]

Upvotes: 1

Thanasis Petsas
Thanasis Petsas

Reputation: 4448

The first row cannot be transform into an integer. You can skip all the rows like the first one by using a try except block:

cr = csv.reader(open('C:\\Users\\chatterjees\\Desktop\\data\\topic_hindex.csv', "rb"))
for row in cr:
  x=row[0]
  try:
    if int(x) <= 10:
      print x
  except ValueError:
    pass

Upvotes: 2

roskakori
roskakori

Reputation: 3346

Here's a solution that skips the first and first row only and fails with ValueError in case any other row contains a non numeric value. It does so by using the built-in enumerate() function which keeps count of the number of rows processed. Furthermore it properly closes the input file when it's done using the with statement.

import csv
with open('C:\\Users\\chatterjees\\Desktop\\data\\topic_hindex.csv', 'rb') as csvFile:
    for rowNumber, row in enumerate(csv.reader(csvFile)):
        if rowNumber > 0:
            x = row[0]
            if int(x) <= 10:
                print x

Upvotes: 1

nneonneo
nneonneo

Reputation: 179717

Rather surprising nobody mentioned csv.DictReader, since it's really the simplest way to skip the header row and get the data in a nice dictionary format:

import csv
with open('C:\\Users\\chatterjees\\Desktop\\data\\topic_hindex.csv', "rb") as f:
    cr = csv.DictReader(f)
    for row in cr:
        x = row['hindex']
        if int(x) <= 10:
            print x

Upvotes: 2

ch3ka
ch3ka

Reputation: 12178

Your first line in the .csv contains something which cannot be converted to an int, so

    if(int(x)<=10):

fails with a ValueError. (there is absolutely no need to enclose the expression in (), btw.)

You can eighter skip the first line of the .csv, or wrap int(x) into a try/catch block, like so:

for row in cr:
    x=row[0]
    try:
        x=int(x)
    except ValueError: # x cannot be converted to int
        continue       # so we skip this row
    if x<=10:  # no need for parens here
        print x

Learn more about Exceptions and handling those here: http://docs.python.org/tutorial/errors.html

Upvotes: 4

cmh
cmh

Reputation: 10937

The code tries to process every line in your file, including hindex. You are trying to convert this string to an int which throws the ValueError:

To skip the first line (which contains the headers) try:

cr = csv.reader(open('C:\\Users\\chatterjees\\Desktop\\data\\topic_hindex.csv', "rb"))
for row in cr[1:]:
    x=row[0]
    if(int(x)<=10):
        print x

Upvotes: 4

Silas Ray
Silas Ray

Reputation: 26160

You need to skip row 1. It is trying to parse your column header from the file in to an int, but since it is a char string, it is choking and dying.

Upvotes: 4

Related Questions