joe___pike
joe___pike

Reputation: 11

compare two csv files and do something if there is a match between two fields

So I have a nested for loop

for rowst in student:
    for rowtu in tutor:
        if rowst['RegGroup'][-3:] in rowtu['StaffCode']:
            print (rowst['RegGroup'][-3:],rowtu['StaffCode'])
            print("----------------------------------------")

student is student = csv.DictReader(fr) tutor is tutor = csv.DictReader(fr2)

What I am trying to achieve is to compare the two files to check if the last 3 characters in student RegGroup match the tutors staffcode. Then print something as shown in the code.

the result I get is:

FLI FLI
----------------------------------------

This suggests it is only working for the first or last value of the first for loop which isn't what I want. I have checked and Yes there are more than one reg group's that matches because I used the reg groups in the student file to populate a unique list of tutors staff codes.

can anyone tell me where i'm going wrong as my friend seems to think that my implementation should work?

as requested, some of the csv data (can't share for gdpr reasons but can show the two fields im comparing)

tutors.csv

StaffCode
FLI
RTH
POD
DFI
LNO
VAI
HPI
LNE
SLA
ASP
HST
RCO
WKI
GBA
RKI
BPE
SMI
NRY
CSC

subset of students.csv (the XX represents a yeargroup)

RegGroup
XXFLI
XXRTH
XXPOD
XXDFI
XXLNO
XXVAI
XXFLI
XXLNO
XXHPI
XXLNO
XXPOD
XXHPI
XXLNE
XXLNO
XXRTH
XXHPI
XXRTH
XXLNO
XXVAI
XXDFI
XXVAI
XXFLI
XXRTH
XXFLI
XXLNE
XXDFI
XXVAI
XXLNE

Upvotes: 0

Views: 248

Answers (2)

jeevs
jeevs

Reputation: 131

the program is trying to compare value in each with the same row in student.csv with the tutor.csv. But, what i understand from what you need is to join the two datasets on the column of StaffCode. Hence, regular regular csv read write might be laborious to do, i recommend using pandas that will help. here is the code.

import pandas as pd
stu = pd.read_csv('./data/students.csv')
tutors = pd.read_csv('./data/tutors.csv')
stu.head()
Out[19]: 
  RegGroup
0    XXFLI
1    XXRTH
2    XXPOD
3    XXDFI
4    XXLNO
tutors.head() # i added a column as 'Staff Name' for fun
Out[20]: 
  StaffCode  Staff Name
0       FLI      Fly Li
1       RTH      Ruth G
2       POD     Podrick
3       DFI   Daren Fli
4       LNO      Lenord
# extract the last three characters as 'StaffCode'
stu['StaffCode'] = stu.apply(lambda row: row['RegGroup'][-3:], axis=1)
stu.head()
Out[22]: 
  RegGroup StaffCode
0    XXFLI       FLI
1    XXRTH       RTH
2    XXPOD       POD
3    XXDFI       DFI
4    XXLNO       LNO
stu_new = pd.merge(stu, tutors, how='left', on=['StaffCode']) # merge it on the StaffCode column and do some cool stuff
stu_new.head()
Out[24]: 
  RegGroup StaffCode  Staff Name
0    XXFLI       FLI      Fly Li
1    XXRTH       RTH      Ruth G
2    XXPOD       POD     Podrick
3    XXDFI       DFI   Daren Fli
4    XXLNO       LNO      Lenord

Just the code

import pandas as pd
stu = pd.read_csv('./data/students.csv')
tutors = pd.read_csv('./data/tutors.csv')
stu.head()
tutors.head()
stu['StaffCode'] = stu.apply(lambda row: row['RegGroup'][-3:], axis=1)
stu.head()
stu_new = pd.merge(stu, tutors, how='left', on=['StaffCode'])
stu_new.head()

Upvotes: 1

Harpe
Harpe

Reputation: 316

The dictReader is an iterator that goes over the file once and has to be re-initiated after reading the file once.

Here is a code example that works, but is not really elegant:

with open("tutor.csv") as stu:
    student = csv.DictReader(stu)
    for rowst in student:
        with open("student.csv") as tu:
            tutor = csv.DictReader(tu)
            for rowtu in tutor:
                if rowst['RegGroup'][-3:] in rowtu['StaffCode']:
                    print (rowst['RegGroup'][-3:],rowtu['StaffCode'])
                    print("----------------------------------------")

the line "with open..." creates a context in which the file is available and is automatically closed afterwards. However for large files, this is not something you want to repeat and you should store your data in an appropriate object.

For that, you can use something like numpy.loadtxt.

Upvotes: 1

Related Questions