user1710791
user1710791

Reputation: 21

How to find top values in a csv column and print the entire row having these top values using Python?

I am relatively new to python and I find it to be useful since I do need to routinely find values out of a large csv file thus I tried to use it.

This is my csv file:
Name, Tag, Size, Height1, Height2,
Name1, B1, 244,42798,5900
Name2, B4, 200,22798,2234
Name3, B5, 240,25798,2745
Name4, B7, 220,32798,4590

I tried to use this code but I still get them jumbled up.

import csv
input = open('file.csv','r')
number_top_values =  raw_input(‘How many top values you need to find?’) #number of top values
file = csv.reader(input)
line1 = file.next()
height = [(row[3],(row[4])) for row in file]
height.sort(key = lambda x: x[1])
height.reverse()
height = height[:number_top_values]
print height

I need to find in the column the top values for Height1 and Height2 (top 2 or top 3 so on depending on how many top values I need to find) and get the entire row having these top values. Any suggestions or possible answers will be a great help. Thanks.

Upvotes: 2

Views: 1002

Answers (2)

icktoofay
icktoofay

Reputation: 129001

You're currently using this:

height = [(row[3],(row[4])) for row in file]
height.sort(key = lambda x: x[1])
height.reverse()

In the first line there, you remove some of the data you need (since you need the whole row). The third line can be made redundant with a modification to the second line. Together:

height = list(file)
height.sort(key=lambda x: int(x[3]), reverse=True)

This sorts on Height1. If you want to sort on Height2, change 3 to 4. If you want to sort on one and then the other, you can sort twice or do something a little trickier:

height.sort(key=lambda x: (int(x[3]), int(x[4])), reverse=True)

Upvotes: 2

Anuj Gupta
Anuj Gupta

Reputation: 10526

Mostly what icktoofay said :)

Working code:

import csv
inputfile = open('file.csv','r')
#don't forget int() to convert input to integer for slicing
while(True):
    try:
        number_top_values =  int(raw_input('How many top values you need to find?')) #number of top values
    except ValueError:
        print "Invalid input! Please try again"
    else:
        break
csvfile = csv.reader(inputfile)
height = list(csvfile)[1:] #exclude first line for labels
height1 = sorted(height, key = lambda x: int(x[3]), reverse=True)[:number_top_values]
height2 = sorted(height, key = lambda x: int(x[4]), reverse=True)[:number_top_values]
# or height.sort(key = lambda x: (int(x[3]),int(x[4])), reverse=True) for double sort
print height1
print height2
inputfile.close()

Upvotes: 1

Related Questions