Rohit
Rohit

Reputation: 868

Merging records from two '.CSV' files in python

I have two '.csv' files in the below format: First File :

Roll_num Class_Name
1         ABC
2         DEF
5         PQR
27        UVW

Second File :

Roll_num Marks Grade
1        75    A
2        60    C
27       68    B
61       45    E

Now i want to add a column in the second file appending a column 'Class_Name' from First File. The data in both the files has duplicates in it and is not sorted.

I have written the following code that writes our required data from 2 files into a new file.

import csv

path="xyz"
file_read=open(path + "ClassName.CSV", "r")
reader_ClassName = csv.reader(file_read)

read_all_data=open(path + "Student.CSV", "r")
reader_Student =csv.reader(read_all_data)
write_all_data=open( path +"Student_Classname.CSV", "w")

for line_Student in reader_Student:
        Roll_Student=line_Student[0]
        for line_ClassName in reader_ClassName:
            Roll_ClassName=line_ClassName[0]
            ClassName=line_ClassName[1]         
            if(Roll_ClassName == Roll_Student):
                string= Roll_Student +","+ClassName  +"\n"
                print string
                write_all_data.write(string)
                break  

Output Expected :

Roll_num Marks Grade Class_Name
1        75    A     ABC
2        60    C     DEF
27       68    B     UVW
61       45    E     LMN

Output our code generates:

   Roll_num Marks Grade Class_Name
    1        75    A     ABC
    2        60    C     DEF

There is some issue in reading the Third line from Second inner 'for' loop. We have hundreds of thousands of records in both the files.

Upvotes: 0

Views: 78

Answers (2)

Kannan Mohan
Kannan Mohan

Reputation: 1850

I have named first CSV file as hash.csv and second CSV file as data.csv. Below script will help you.

import re

# Building up a hash with roll_num and class_name from hash.csv
chash = dict([ re.split('\s+', x.strip()) for x in open('hash.csv').readlines()][1:])

# Building a list of students record from data.csv
data = [ re.split('\s+', x.strip()) for x in open('data.csv').readlines() ][1:]

# Iterating through each data
for x in data:
    if x[0] in chash:
        x.append(chash[x[0]])
        print('{0:<5} {1:<5} {2:<5} {3:<5}'.format(*x))
    else:
        print('{0:<5} {1:<5} {2:<5}'.format(*x))

Output:

1     75    A  ABC
2     60    C  DEF
27    68    B  UVW
61    45    E

Upvotes: 0

Phuc Tran
Phuc Tran

Reputation: 160

I suggest to avoid loop in loop by reading the whole ClassName.csv file and put into an dictionary first. I suggest the idea below

mydict = {}
for each_line in ClassName_csvfile:
     rollnum = get_roll_num()
     classname = get_class_name()
     mydict[rollnum]=classname 

for each_line in Student_csv_file:
     rollnum = get_roll_num()
     mark = get_mark()
     grade = get_grade()
     Classname = ''
     if mydict.has_key(rollnum):
        Classname = mydict[rollnum]
     writetofile(rollnum, mark, grade, Classname)

Update: you can use if rollnum in mydict: instead of mydict.has_key(rollnum) if you are using Python 2.3+. I am using python 2.7 and both works

P/s: Sorry for not commenting as it requires me 50 reputations

Upvotes: 1

Related Questions