Reputation: 23
I have 2 files
File 1:
| Name | Age | Place |Work_Start|
|:---- |:------:| -----: | -----:|
|Ester | 27 | Beijing| 8 |
|Jack | 29 | Tokyo | 9 |
|Mary | 31 |New_York| 8 |
|Leo | 25 |England | 10 |
File 2:
| Name | Age | Place |Work_Start|
|:---- |:------:| -----: | -----:|
|Ethan | 29 |Osaka | 7 |
|Mary | 31 |New_York| 8 |
|Leo | 25 |England | 9 |
I want to extract common rows based on column 1 2 and 3, and write them in separate files. After extracting, it would be like this
File 1 common:
| Name | Age | Place |Work_Start|
|:---- |:------:| -----: | -----:|
|Mary |31 |New_York| 8 |
|Leo |25 |England | 10 |
File 2:
| Name | Age | Place |Work_Start|
|:---- |:------:| -----: | -----:|
|Mary | 31 |New_York| 8 |
|Leo | 25 |England | 9 |
My attempted code
import pandas as pd
import csv
file1 = open('file1.csv')
file2 = open('file2.csv')
file1_common = open('file1_common.csv', 'w')
file1_r=csv.reader(file1)
file2_r=csv.reader(file2)
file1_common_w = csv.writer(file1_common)
count = 0
file2_set = set()
header1 = next(file1_r)
header2 = next(file2_r)
file1_common_w.writerow(header1)
for row2 in file2_r:
file2_set.add(row2[0])
for row1 in file1_r:
for row2 in file2_set:
if (row1[0] in row2[0]) and (row1[1] in row2[1]) and (row1[2] in row2[2]):
break
else:
count=count+1
file1_common_w.writerow(row1)
file1.close()
file2.close()
It gave out the File 1 info not common rows. And for file2_common, I was thinking to reverse row1 and row2. Would that work? Thanks for the help.
(I actually use the code as template from another post but I forgot which.)
Upvotes: 0
Views: 109
Reputation: 92440
If you use the DictReader
from csv
you can get dicts of each row in your csv. Using operator.itemgetter
is a convenient way to pull specific keys from those dicts which will allow you to make tuples of the value that you can add to sets. Doing that, you can just take the intersection:
import csv
from operator import itemgetter
getter = itemgetter('Name','Age','Place')
file1 = open(file1_path)
file2 = open(file2_path)
file1_r=csv.DictReader(file1)
file2_r=csv.DictReader(file2)
set(map(getter, file1_r)).intersection(map(getter, file2_r))
This will give the set:
{('Leo', '25', 'England'), ('Mary', '31', 'New')}
which you can convert back to csv.
Upvotes: 1