Tilak
Tilak

Reputation: 3

How to compare two csv files in Python

I have two csv files. One is called 'Standard reg.csv', the other is 'Driver Details.csv'

In 'Standard reg.csv' the first two lines are:

['Day', 'Month', 'Year', 'Reg Plate', 'Hour', 'Minute', 'Second', 'Speed over limit']
['1', '1', '2016', 'NU16REG', '1', '1', '1', '5816.1667859699355']

The first two lines in Driver Details.csv are:

['FirstName', 'LastName', 'StreetAddress', 'City', 'Region', 'Country', 'PostCode', 'Registration']
['Violet', 'Kirby', '585-4073 Convallis Street', 'Balfour', 'Orkney', 'United Kingdom', 'OC1X 6QE', 'NU16REG']

My code is this:

import csv
file_1 = csv.reader(open('Standard Reg.csv', 'r'), delimiter=',')
file_2 = csv.reader(open('Driver Details.csv', 'r'), delimiter=',')
for row in file_1:
    reg = row[3]
    avgspeed = row[7]
    for row in file_2:
        firstname = row[0]
        lastname = row[1]
        address = row[2]
        city = row[3]
        region = row[4]
        reg2 = row[7]
if reg  == reg2:
    print('Match found')
else:
    print('No match found')

It's a work-in-progress, but I can't seem to get the code to compare more than just the last line.

With print(reg) after this line: reg2 = row[7]

it shows it has read that whole column. The entire column is also printed when I do print(reg2) after:reg2 = row[7]

But at if reg == reg2: it only reads the last lines of both columns and compares them and I'm not sure how to fix this.

Thank you in advance.

Upvotes: 0

Views: 2257

Answers (3)

Martin Evans
Martin Evans

Reputation: 46789

I suggest you first load all of the details from the Driver Details.csv into a dictionary, using the registration number as the key. This would then allow you to easily look up a given entry without having to keep reading all of the lines from the file again:

import csv

driver_details = {}

with open('Driver Details.csv') as f_driver_details:
    csv_driver_details = csv.reader(f_driver_details)
    header = next(csv_driver_details)       # skip the header

    for row in csv_driver_details:
        driver_details[row[7]] = row

with open('Standard Reg.csv') as f_standard_reg:
    csv_standard_reg = csv.reader(f_standard_reg)
    header = next(csv_standard_reg)     # skip the header

    for row in csv_standard_reg:
        try:
            driver = driver_details[row[3]]
            print('Match found - {} {}'.format(driver[0], driver[1]))
        except KeyError as e:
            print('No match found')

The code as you have it will iterate through file_2 and leave the file pointer either at the end (if no match is found) or at the location of a match (potentially missing matches earlier on for the next entry). For your approach to work you would have to start reading the file from the start for each loop, which would be very slow.


To add an output csv and display the full address you could do something like the following:

import csv

speed = 74.3
fine = 35

driver_details = {}

with open('Driver Details.csv') as f_driver_details:
    csv_driver_details = csv.reader(f_driver_details)
    header = next(csv_driver_details)       # skip the header

    for row in csv_driver_details:
        driver_details[row[7]] = row

with open('Standard Reg.csv') as f_standard_reg, open('Output log.csv', 'w', newline='') as f_output:
    csv_standard_reg = csv.reader(f_standard_reg)
    header = next(csv_standard_reg)     # skip the header
    csv_output = csv.writer(f_output)

    for row in csv_standard_reg:
        try:
            driver = driver_details[row[3]]
            print('Match found - Fine {}, Speed {}\n{} {}\n{}'.format(fine, speed, driver[0], driver[1], '\n'.join(driver[2:7])))
            csv_output.writerow(driver[0:7] + [speed, fine])
        except KeyError as e:
            print('No match found')

This would print the following:

Match found - Fine 35, Speed 74.3
Violet Kirby
585-4073 Convallis Street
Balfour
Orkney
United Kingdom
OC1X 6QE

And produce an output file containing:

Violet,Kirby,585-4073 Convallis Street,Balfour,Orkney,United Kingdom,OC1X 6QE,74.3,35

Upvotes: 1

aghast
aghast

Reputation: 15320

Try csv.DictReader to eliminate most of your lines of code:

import csv
Violations = defaultdict(list)

# Read in the violations, there are probably less violations than drivers (I hope!)
with open('Standard reg.csv') as violations:
    for v in csv.DictReader(violations):
        Violations[v['Reg Plate']] = v

with open('Driver Details.csv') as drivers:
    for d in csv.DictReader(drivers):
        fullname = "{driver.FirstName} {driver.LastName}".format(driver=d)
        if d['Registration'] in Violations:
            count = len(Violations[d['Registration']])
            print("{fullname} has {count} violations.".format(fullname=fullname, count=count))
        else:
            print("{fullname} is too fast to catch!".format(fullname=fullname))

Upvotes: 0

Sci Prog
Sci Prog

Reputation: 2691

The testing condition if reg == reg2 appears outside both loops (for file_1 and for file_2). That is why the testing is only done with the last line from each file.

Another problem is that you use the same loop variable row in both for loops.

Upvotes: 1

Related Questions