Reputation: 11
So I have a nested for loop
for rowst in student:
for rowtu in tutor:
if rowst['RegGroup'][-3:] in rowtu['StaffCode']:
print (rowst['RegGroup'][-3:],rowtu['StaffCode'])
print("----------------------------------------")
student is student = csv.DictReader(fr)
tutor is tutor = csv.DictReader(fr2)
What I am trying to achieve is to compare the two files to check if the last 3 characters in student RegGroup match the tutors staffcode. Then print something as shown in the code.
the result I get is:
FLI FLI
----------------------------------------
This suggests it is only working for the first or last value of the first for loop which isn't what I want. I have checked and Yes there are more than one reg group's that matches because I used the reg groups in the student file to populate a unique list of tutors staff codes.
can anyone tell me where i'm going wrong as my friend seems to think that my implementation should work?
as requested, some of the csv data (can't share for gdpr reasons but can show the two fields im comparing)
tutors.csv
StaffCode
FLI
RTH
POD
DFI
LNO
VAI
HPI
LNE
SLA
ASP
HST
RCO
WKI
GBA
RKI
BPE
SMI
NRY
CSC
subset of students.csv (the XX represents a yeargroup)
RegGroup
XXFLI
XXRTH
XXPOD
XXDFI
XXLNO
XXVAI
XXFLI
XXLNO
XXHPI
XXLNO
XXPOD
XXHPI
XXLNE
XXLNO
XXRTH
XXHPI
XXRTH
XXLNO
XXVAI
XXDFI
XXVAI
XXFLI
XXRTH
XXFLI
XXLNE
XXDFI
XXVAI
XXLNE
Upvotes: 0
Views: 248
Reputation: 131
the program is trying to compare value in each with the same row in student.csv with the tutor.csv. But, what i understand from what you need is to join the two datasets on the column of StaffCode. Hence, regular regular csv read write might be laborious to do, i recommend using pandas that will help. here is the code.
import pandas as pd
stu = pd.read_csv('./data/students.csv')
tutors = pd.read_csv('./data/tutors.csv')
stu.head()
Out[19]:
RegGroup
0 XXFLI
1 XXRTH
2 XXPOD
3 XXDFI
4 XXLNO
tutors.head() # i added a column as 'Staff Name' for fun
Out[20]:
StaffCode Staff Name
0 FLI Fly Li
1 RTH Ruth G
2 POD Podrick
3 DFI Daren Fli
4 LNO Lenord
# extract the last three characters as 'StaffCode'
stu['StaffCode'] = stu.apply(lambda row: row['RegGroup'][-3:], axis=1)
stu.head()
Out[22]:
RegGroup StaffCode
0 XXFLI FLI
1 XXRTH RTH
2 XXPOD POD
3 XXDFI DFI
4 XXLNO LNO
stu_new = pd.merge(stu, tutors, how='left', on=['StaffCode']) # merge it on the StaffCode column and do some cool stuff
stu_new.head()
Out[24]:
RegGroup StaffCode Staff Name
0 XXFLI FLI Fly Li
1 XXRTH RTH Ruth G
2 XXPOD POD Podrick
3 XXDFI DFI Daren Fli
4 XXLNO LNO Lenord
Just the code
import pandas as pd
stu = pd.read_csv('./data/students.csv')
tutors = pd.read_csv('./data/tutors.csv')
stu.head()
tutors.head()
stu['StaffCode'] = stu.apply(lambda row: row['RegGroup'][-3:], axis=1)
stu.head()
stu_new = pd.merge(stu, tutors, how='left', on=['StaffCode'])
stu_new.head()
Upvotes: 1
Reputation: 316
The dictReader is an iterator that goes over the file once and has to be re-initiated after reading the file once.
Here is a code example that works, but is not really elegant:
with open("tutor.csv") as stu:
student = csv.DictReader(stu)
for rowst in student:
with open("student.csv") as tu:
tutor = csv.DictReader(tu)
for rowtu in tutor:
if rowst['RegGroup'][-3:] in rowtu['StaffCode']:
print (rowst['RegGroup'][-3:],rowtu['StaffCode'])
print("----------------------------------------")
the line "with open..." creates a context in which the file is available and is automatically closed afterwards. However for large files, this is not something you want to repeat and you should store your data in an appropriate object.
For that, you can use something like numpy.loadtxt.
Upvotes: 1