LearningNinja
LearningNinja

Reputation: 447

Sorting every column in a csv file

I have a csv file with following entries:

    Year,Month,Company A, Company B,Company C, .............Company N
    1990, Jan, 10, 15, 20, , ..........,50
    1990, Feb, 10, 15, 20, , ..........,50

I am trying to sort the csv file data for Company A, and so on, till Company N.

My piece of code works fine for the 1st run in the loop, but fails for the 2nd.

    try:
        reader = csv.DictReader(open(self.filename,'r')) #Try and open the file with csv dictreader
    except IOError:
        print "Error Opening File -- Check if file exists"

    ncols = reader.next()
    print ncol.keys()
    for key in ncols.keys():
        if key != 'Month' and key != 'Year':
            print key
            result = sorted(reader, key=lambda d: float(d[key]))
            result = result[-1]
            #print "Year " ,
            print result['Year'],
            #print "Month ",
            print result ['Month'],
            print key,
            print result[key]

Output:

    Company-E
    2008 Oct Company-E 997
    Company-D

    Traceback (most recent call last):
    File "<pyshell#105>", line 1, in <module>
    read.ParseData()
    File "C:/Users/prince/Desktop/CsvRead.py", line 55, in ParseData
    result = result[-1]
    IndexError: list index out of range

Upvotes: 1

Views: 127

Answers (2)

LearningNinja
LearningNinja

Reputation: 447

The code did work by adding two more lines: I needed to rewind the file to initial position.

fh.seek(0)

fh.next()

Here's the working piece of the code:

        actualResult = {}
        try:
            fh = open(filename,'r')
            reader = csv.DictReader(fh) #Try and open the file with csv dictreader

            #Get the field names in the file:
            fields = set(reader.fieldnames)
            if not fields or ('Year' not in fields and 
            'Month' not in fields):
                raise BadInputFile(filename)
            companies = fields - {'Year', 'Month'}
            print companies
            for name in companies:
                #sorting the csv file data based on column data with Company Name as Key
                result = sorted(reader, key=lambda d: float(d[name]), reverse=True)
                result = result[0]
                tup = (result[name],result['Year'],result['Month'])
                if name not in actualResult.keys():
                    actualResult.update({str(name): tup})
                else:
                    raise BadInputFile(filename)
                fh.seek(0) #rewinding the file to initial position
                fh.next()  #Moving to the 1st row
        except (IOError, BadInputFile) as e:
            print "Error: ", str(e) # Invalid input file
            raise


        return actualResult

Upvotes: 0

freidrichen
freidrichen

Reputation: 2566

I would suggest using pandas:

import pandas
df = pandas.read_csv(filename)
for col in df.columns:
    if col != 'Month' and col != 'Year':
        df = df.sort(col)
df.to_csv(out_filename, index=False)

Upvotes: 3

Related Questions