Reputation: 1643
I am trying to join two CSV files based on one common column.
I am reading the CSV file storing a list of tuples. My code:
def read_csv(path):
file = open(path, "r")
content_list = []
for line in file.readlines():
record = line.split(",")
for i in range(len(record)):
record[i] = record[i].replace("\n","")
content_list.append(tuple(record))
return content_list
a_list = read_csv("a.csv")
b_list = read_csv("b.csv")
This is giving me list with headers of CSV as first tuple in the list
a_list
[('user_id', 'activeFl'),
('80c611f1-532a-4f7d-aa80-f28b472c0dbe', 'True'),
('4d04ab57-1b50-4474-bd12-b2b16ed2cca3', 'True'),
('0f37a42a-a984-4402-97bd-0eac95fa95d1', 'True'),
('dbe15b19-0128-4e3a-a82b-c8154d272c18', 'True'), ......]
b_list
[('id','date','user_id','blockedFl','amount','type'),
('b7819826-6468-4416-9953-e739d8046b81','2021-04-23','18a382ef-bd38-4884-8bf','True,'9.04','6'), ....]
I would like to merge these two lists based on the user_id, but I am stuck at this point. What can I try next?
Upvotes: 0
Views: 1371
Reputation: 82765
This is one approach using csv
module and a dict
Ex:
import csv
def read_csv(path):
with open(path) as infile:
reader = csv.reader(infile)
header = next(reader)
content = {i[0]: i for i in reader} # UserID as key
return content
a_list = read_csv("a.csv")
b_list = read_csv("b.csv")
merge_data = {k: v + [a_list.get(k)] for k, v in b_list.items()}
print(merge_data) # OR print(list(merge_data.values()))
Upvotes: 0
Reputation:
the O(N^2)
solution is:
result = list()
for left in a_list[1:]:
for right in b_list[1:]:
if left[0] == right[0]:
result.append(right + left[1:])
break
O(N)
using dictionary:
result =list()
b_dict = {x[0]: x for x in b_list[1:]}
for left in a_list[1:]:
if left[0] in b_dict:
result.append(b_dict.get(left[0]) + left[1:])
Upvotes: 1