Reputation: 1643

Joining two CSV files (inner join) based on a common column in Python without Pandas

I am trying to join two CSV files based on one common column.

I am reading the CSV file storing a list of tuples. My code:

def read_csv(path):
    file = open(path, "r")
    content_list = []    
    for line in file.readlines():
        record = line.split(",")
        for i in range(len(record)):
            record[i] = record[i].replace("\n","")    
        content_list.append(tuple(record))
    return content_list

a_list = read_csv("a.csv")
b_list = read_csv("b.csv")

This is giving me list with headers of CSV as first tuple in the list

a_list

[('user_id', 'activeFl'),
 ('80c611f1-532a-4f7d-aa80-f28b472c0dbe', 'True'),
 ('4d04ab57-1b50-4474-bd12-b2b16ed2cca3', 'True'),
 ('0f37a42a-a984-4402-97bd-0eac95fa95d1', 'True'),
 ('dbe15b19-0128-4e3a-a82b-c8154d272c18', 'True'), ......]

b_list

[('id','date','user_id','blockedFl','amount','type'),
 ('b7819826-6468-4416-9953-e739d8046b81','2021-04-23','18a382ef-bd38-4884-8bf','True,'9.04','6'), ....]

I would like to merge these two lists based on the user_id, but I am stuck at this point. What can I try next?

Upvotes: 0

Answers (2)

Rakesh

Reputation: 82765

This is one approach using csv module and a dict

Ex:

import csv

def read_csv(path):
    with open(path) as infile:
        reader = csv.reader(infile)
        header = next(reader)
        content = {i[0]: i for i in reader}   # UserID as key
        return content 

a_list = read_csv("a.csv")
b_list = read_csv("b.csv")

merge_data = {k: v + [a_list.get(k)] for k, v in b_list.items()}
print(merge_data) # OR print(list(merge_data.values()))

Upvotes: 0

user8060120

Reputation:

the O(N^2) solution is:

result = list()
for left in a_list[1:]:
    for right in b_list[1:]:
        if left[0] == right[0]:
            result.append(right + left[1:])
            break

O(N) using dictionary:

result =list()
b_dict = {x[0]: x for x in b_list[1:]}
for left in a_list[1:]:
    if left[0] in b_dict:
        result.append(b_dict.get(left[0]) + left[1:])

Upvotes: 1

Joining two CSV files (inner join) based on a common column in Python without Pandas

Answers (2)

Related Questions