Daniel Lynes
Daniel Lynes

Reputation: 67

Trying to get the largest number in a column of a .csv file

This is what I have currently, I get the error int is 'int' object is not iterable. If I understand correctly my issue is that BIKE_AVAILABLE is assigned a number at the top of my project with a number so instead of looking at the column it is looking at that number and hitting an error. How should I go about going through the column? I apologize in advance for the newby question

for i in range(len(stations[BIKES_AVAILABLE]) -1):
    most_bikes = max(stations[BIKES_AVAILABLE])
sort(stations[BIKES_AVAILABLE]).remove(max(stations[BIKES_AVAILABLE]))

if most_bikes == max(stations[BIKES_AVAILABLE]):
    second_most = max(stations[BIKES_AVAILABLE])
    index_1 = index(most_bikes)
    index_2 = index(second_most)
    most_bikes = max(data[0][index_1], data[0][index_2])

return most_bikes

Upvotes: 1

Views: 3583

Answers (3)

Joe Iddon
Joe Iddon

Reputation: 20424

Using a generator inside max()

If you have a CSV file named test.csv, with contents:

line1,3,abc
line2,1,ahc
line3,9,sbc
line4,4,agc

You can use a generator expression inside the max() function for a memory efficient solution (i.e. no list is created).

If you wanted to do this for the second column, then:

max(int(l.split(',')[1]) for l in open("test.csv").readlines())

which would give 9 for this example.


Update

To get the row (index), you need to store the index of the max number in the column so that you can access this:

max(((i,int(l.split(',')[1])) for i,l in enumerate(open("test.csv").readlines())),key=lambda t:t[1])[0]

which gives 2 here as the line in test.csv (above) with the max number in column 2 (which is 9) is 2 (i.e. the third line).

This works fine, but you may prefer to just break it up slightly:

lines = open("test.csv").readlines()
max(((i,int(l.split(',')[1])) for i,l in enumerate(lines)),key=lambda t:t[1])[0]

Upvotes: 0

s3bw
s3bw

Reputation: 3049

Another method that might be better for you to use with data manipulation is to try the pandas module.

Then you could do this:

import pandas as pd

data = pd.read_csv('bicycle_data.csv')

# Alternative:
# most_sales = data['sold'].max()
most_sales = max(data['sold'])

Now you don't have to worry about indexing columns with numbers:

You can also do something like this:

sorted_data = data.sort_values(by='sold', ascending=False)

# Displays top 5 sold bicycles.
print(sorted_data.head(5))

More importantly if you enjoy using indexes, there is a function to get you the index of the max value called idxmax built into pandas.

Upvotes: 2

s3bw
s3bw

Reputation: 3049

Assuming a csv structure like so:

data = ['1,blue,15,True',
    '2,red,25,False',
    '3,orange,35,False',
    '4,yellow,24,True',
    '5,green,12,True']

If I want to get the max value from the 3rd column I would do this:

largest_number = max([n.split(',')[2] for n in data])

Upvotes: 0

Related Questions