HBS
HBS

Reputation: 55

Sort csv by numerical values

I was looking and reading previous questions and answers similar to mine, but didn't find a good solution to my case.

I have a csv file with 2 columns. One is numerical values (100, 40, 350 etc.), and the other is a list of pictures tag name (i.e. computer.jpg).

What I am trying to do is to sort the file by the first column (i.e. the values) in ascending manner. That also means that I want the pair of the value (the image name) to move accordingly with it.

This is what I have so far:

import csv
import operator
sample=open('random.csv', "r")
csv1=csv.reader(sample, delimiter='.')
header=next(csv1, None)
sort= sorted(csv1, key=operator.itemgetter(0))

with open('randomized.csv', "wb") as f:
    csv_writer=csv.writer(f, delimiter='.')
    if header:
        csv_writer.writerow(header)
    csv_writer.writerows(sort)

The resulted sorted list, however, is like this: 100, 175, 20, 250, 3.

I tried some other codes, like this:

sort=csv1.sort(key=lambda row: row[0], reverse=True)

But I receive an error back: AttributeError: '_csv.reader' object has no attribute 'sort'

I read about one option which is to convert the numbers to strings before sorting, and then converting them back to numbers before writing the file, but don't really have the tools to do it.

Anyone has a clue how to get this one down?

Upvotes: 3

Views: 8496

Answers (1)

Padraic Cunningham
Padraic Cunningham

Reputation: 180461

You need to cast to int:

srt= sorted(csv1, key=lambda x: int(x[0]))

You are sorting strings where 100 < 12

sort=csv1.sort(key=lambda row: row[0], reverse=True) gives you an error because csv1 is a reader object not a list, you would have to call list on the reader object but you may as well just use sorted.

You can see using ints gives very different output:

In [46]: l = ["100","2","12","200"]

In [47]: sorted(l)
Out[47]: ['100', '12', '2', '200']

In [48]: sorted(l,key=lambda x: int(x))
Out[48]: ['2', '12', '100', '200']

Sorting as strings python goes char by char, the first char with a greater value will be considered a higher value so as in the first sorted output 100 is considered < 12 because 2 > 0.

Putting it all together:

import csv

with open('random.csv') as sample, open('randomized.csv', "w") as out:
    csv1=csv.reader(sample)
    header = next(csv1, None)
    csv_writer = csv.writer(out)
    if header:
        csv_writer.writerow(header)
    csv_writer.writerows(sorted(csv1, key=lambda x:int(x[0])))

The output will be:

value, image
50, bottle.jpg
75, broomstick.jpg
175, book.jpg
100, car.jpg

Which is not putting the data into one column, the , is what delimits each column, i.e comma seperated values.

Upvotes: 7

Related Questions