How to find maximum number from csv-read data in Python 2.7?

Question

There is a CSV called vic_visitors.csv with this data:

Victoria's Regions,2004,2005,2006,2007
Gippsland,63354,47083,51517,54872
Goldfields,42625,36358,30358,36486
Grampians,64092,41773,29102,38058
Great Ocean Road,185456,153925,150268,167458
Melbourne,1236417,1263118,1357800,1377291

And there is a question asking:

Q. Write a program to find the greatest visitor number in Victoria from the CSV data vic_visitors.csv. Your program should print the result in the format "The greatest visitornumber was 'x' in 'y' in the year 'z'.

I can do until here to access the data so that data_2d gives me the information in a 2-dimensions with data_2d[i]=row and data_2d[i][j]=column:

import csv
visitors=open("vic_visitors.csv")
data=csv.reader(visitors)
data_2d=list(data)

But am quite lost on how to retrieve the maximum number of people and its corresponding region and year.

Martijn Pieters · Accepted Answer

You have 4 problems to solve:

You need to keep the column titles so you can report the year properly
csv gives you everything as strings, while you need to compare the values numerically
You need to find the maximum value for each row.
You need to find the maximum row, by that maximum value for a given row.

You could use a DictReader() to solve the first part. You could either convert the values to integers as you read the file, or convert the values as you determine the maximum. And you could determine the maximum per row as you read, or when you do the last step, in one go.

I'd do as much as possible when reading, discarding any data you don't need in the process:

import csv

maximum_value = None
with open("vic_visitors.csv", 'rb') as visitors:
    reader = csv.DictReader(visitors)
    for row in reader:
        count, year = max((int(row[year]), year) for year in reader.fieldnames[1:])  # skip the first column
        if not maximum_value or count > maximum_value[0]:
            maximum_value = (count, row[reader.fieldnames[0]], year)

print "The greatest visitornumber was {} in {} in the year {}.".format(
    *maximum_value)

The max(...) line loops over the key-value pairs in each row dictionary (which uses the first row of the CSV as the keys), selecting the year columns (so all fields but the first). By putting the numerical value first you get the maximum column value for that row, paired with the year.

We then store the maximum row information found so far (just the count, the region and the year); no need to keep any other rows. That tuple is then formatted at the end by plugging those 3 values in to a template.

By using the DictReader.fieldnames list we keep this flexible; as long as the first column is a region and the remainder are years the code will adapt to any changes.

Demo:

>>> import csv
>>> sample = '''\
... Victoria's Regions,2004,2005,2006,2007
... Gippsland,63354,47083,51517,54872
... Goldfields,42625,36358,30358,36486
... Grampians,64092,41773,29102,38058
... Great Ocean Road,185456,153925,150268,167458
... Melbourne,1236417,1263118,1357800,1377291
... '''.splitlines(True)
>>> maximum_value = None
>>> reader = csv.DictReader(sample)
>>> for row in reader:
...     count, year = max((int(row[year]), year) for year in reader.fieldnames[1:])  # skip the first column
...     if not maximum_value or count > maximum_value[0]:
...         maximum_value = (count, row[reader.fieldnames[0]], year)
... 
>>> print "The greatest visitornumber was {} in {} in the year {}.".format(
...     *maximum_value)
The greatest visitornumber was 1377291 in Melbourne in the year 2007.

How to find maximum number from csv-read data in Python 2.7?

Answers (2)

Related Questions