Reputation: 253
I am formatting CSV in Python to get the desired result but my code doesn't seem to work properly.
I have the first CSV file in format:
2,a
1,a
4,a
5,a
3,a
1,a
3,b
2,b
1,a
The second CSV file in format:
1,a,123
1,a,234
2,a,456
2,b,345
3,a,789
3,b,232
4,a,987
As the first CSV file is not sorted, the second CSV file is sorted in increasing order with respect to first column
I want the output in format:
2,a,456
1,a,123
4,a,987
5,a
3,a,789
1,a,234
3,b,232
2,b,345
1,a
The result is printed in respect to the first CSV, if the first CSV file combination is not found in the second CSV File, for example if 5,a
is not in the second CSV file then just 5,a
is printed in its respected position. The first CSV file contains many duplicates, while in the second CSV file each row is unique.
Here is my Code
for (num,alpha) in first_csv:
value_found = True
for (num1,alpha1,num2) in second_csv:
if (num == num1 and alpha == alpha1):
csv_out += str(num) + ',' + str(alpha) + ',' + str(number)
value_found = False
if value_found:
count+=1
if count == 1:
csv_out += str(num) + ',' + str(alpha)
first_csv and second_csv are tuples I have created after reading the CSV files with the code:
with open('first_csv.csv') as f:
f.readline()
first_csv = tuple(csv.reader(f, delimiter=','))
with open('second_csv.csv') as f:
f.readline()
second_csv = tuple(csv.reader(f, delimiter=','))
But it is not printing the desired output, where am I doing wrong?
Upvotes: 2
Views: 202
Reputation: 25207
This collects the num2
values from second file into a dictionary of deque
s. When matches are found, they are removed using deque.popleft
to use each match only once, in the same order as they occurred in the second file.
from collections import defaultdict, deque
with open('second_csv.csv') as f:
next(f) #skip header
dic = defaultdict(deque)
for num1,alpha1,num2 in csv.reader(f, delimiter=','):
dic[num1, alpha1].append(num2)
with open('first_csv.csv') as f, open('out.csv', 'wb') as fout:
next(f) #skip header
csv_out = csv.writer(fout)
for num,alpha in csv.reader(f, delimiter=','):
try:
num2 = dic[num,alpha].popleft()
csv_out.writerow([num,alpha,num2])
except IndexError:
csv_out.writerow([num,alpha])
deque
retains the order of items that you add by append
and remove by popleft
.dict
does fast lookup by key which can be a tupledefaultdict(deque)
is a dict
that creates an empty deque
automatically when you access a missing key, so you can append
to it directly.Upvotes: 3
Reputation: 7821
This should do the trick. Note that, for each iteration over first_csv, the worst-case scenario is that it would have to iterate to the end of what's left of the data from the second csv file (the rows gets pop-ed if matched).
import csv
with open("second_csv.csv") as in_file:
reader = csv.reader(in_file)
lookup = list(reader)
with open("first_csv.csv") as in_file, open('output.csv', 'wb') as out_file:
reader = csv.reader(in_file)
writer = csv.writer(out_file)
for row in reader:
for i, data in enumerate(lookup):
if row == data[:2]:
row = lookup.pop(i)
break
writer.writerow(row)
output.csv
2,a,456
1,a,123
4,a,987
5,a
3,a,789
1,a,234
3,b,232
2,b,345
1,a
Upvotes: 2