Reputation: 75
There is a CSV called vic_visitors.csv with this data:
Victoria's Regions,2004,2005,2006,2007
Gippsland,63354,47083,51517,54872
Goldfields,42625,36358,30358,36486
Grampians,64092,41773,29102,38058
Great Ocean Road,185456,153925,150268,167458
Melbourne,1236417,1263118,1357800,1377291
And there is a question asking:
Q. Write a program to find the greatest visitor number in Victoria from the CSV data vic_visitors.csv. Your program should print the result in the format "The greatest visitornumber was 'x' in 'y' in the year 'z'.
I can do until here to access the data so that data_2d gives me the information in a 2-dimensions with data_2d[i]=row
and data_2d[i][j]=column
:
import csv
visitors=open("vic_visitors.csv")
data=csv.reader(visitors)
data_2d=list(data)
But am quite lost on how to retrieve the maximum number of people and its corresponding region and year.
Upvotes: 1
Views: 2078
Reputation: 1122222
You have 4 problems to solve:
csv
gives you everything as strings, while you need to compare the values numericallyYou could use a DictReader()
to solve the first part. You could either convert the values to integers as you read the file, or convert the values as you determine the maximum. And you could determine the maximum per row as you read, or when you do the last step, in one go.
I'd do as much as possible when reading, discarding any data you don't need in the process:
import csv
maximum_value = None
with open("vic_visitors.csv", 'rb') as visitors:
reader = csv.DictReader(visitors)
for row in reader:
count, year = max((int(row[year]), year) for year in reader.fieldnames[1:]) # skip the first column
if not maximum_value or count > maximum_value[0]:
maximum_value = (count, row[reader.fieldnames[0]], year)
print "The greatest visitornumber was {} in {} in the year {}.".format(
*maximum_value)
The max(...)
line loops over the key-value pairs in each row dictionary (which uses the first row of the CSV as the keys), selecting the year columns (so all fields but the first). By putting the numerical value first you get the maximum column value for that row, paired with the year.
We then store the maximum row information found so far (just the count, the region and the year); no need to keep any other rows. That tuple is then formatted at the end by plugging those 3 values in to a template.
By using the DictReader.fieldnames
list we keep this flexible; as long as the first column is a region and the remainder are years the code will adapt to any changes.
Demo:
>>> import csv
>>> sample = '''\
... Victoria's Regions,2004,2005,2006,2007
... Gippsland,63354,47083,51517,54872
... Goldfields,42625,36358,30358,36486
... Grampians,64092,41773,29102,38058
... Great Ocean Road,185456,153925,150268,167458
... Melbourne,1236417,1263118,1357800,1377291
... '''.splitlines(True)
>>> maximum_value = None
>>> reader = csv.DictReader(sample)
>>> for row in reader:
... count, year = max((int(row[year]), year) for year in reader.fieldnames[1:]) # skip the first column
... if not maximum_value or count > maximum_value[0]:
... maximum_value = (count, row[reader.fieldnames[0]], year)
...
>>> print "The greatest visitornumber was {} in {} in the year {}.".format(
... *maximum_value)
The greatest visitornumber was 1377291 in Melbourne in the year 2007.
Upvotes: 4
Reputation: 1402
you can use following approach which scans through each entry and assigns the max and max parameter each time the entry exceeds current max value .
import csv
with open('vic_visitors.csv') as f:
reader = csv.DictReader(f)
max = 0
for row in reader:
if(float(row['2004'])>max):
max = float(row['2004'])
maxyear = '2004'
maxloc = row["Victoria's Regions"]
if(float(row['2005'])>max):
max = float(row['2005'])
maxyear = '2005'
maxloc = row["Victoria's Regions"]
if(float(row['2006'])>max):
max = float(row['2006'])
maxyear = '2006'
maxloc = row["Victoria's Regions"]
if(float(row['2007'])>max):
max = float(row['2007'])
maxyear = '2007'
maxloc = row["Victoria's Regions"]
print("The greatest visitornumber was "+ str(max) +" in " +maxloc+ " in the year "+maxyear)
Upvotes: 0