Reputation: 457
I have a csv file like this: (no headers)
aaa,1,2,3,4,5
bbb,2,3,4,5,6
ccc,3,5,7,8,5
ddd,4,6,5,8,9
I want to search another csv file: (no headers)
bbb,1,2,3,4,5,,6,4,7
kkk,2,3,4,5,6,5,4,5,6
ccc,3,4,5,6,8,9,6,9,6
aaa,1,2,3,4,6,6,4,6,4
sss,1,2,3,4,5,3,5,3,5
and print rows in the second file(based on matching of the first columns) that exist in the first file. So results will be:
bbb,1,2,3,4,5,,6,4,7
ccc,3,4,5,6,8,9,6,9,6
aaa,1,2,3,4,6,6,4,6,4
I have following code, but it does not print anything:
labels = []
with open("csv1.csv", "r") as f:
f.readline()
for line in f:
labels.append((line.strip("\n")))
with open("csv2.csv", "r") as f:
f.readline()
for line in f:
if (line.split(",")[1]) in labels:
print (line)
If possible, could you tell me how to do this, please ? What is wrong with my code ? Thanks in advance !
Upvotes: 1
Views: 57
Reputation: 4744
This is one solution, although you may also look into csv-specific tools and pandas as suggested:
labels = []
with open("csv1.csv", "r") as f:
lines = f.readlines()
for line in lines:
labels.append(line.split(',')[0])
with open("csv2.csv", "r") as f:
lines = f.readlines()
with open("csv_out.csv", "w") as out:
for line in lines:
temp = line.split(',')
if any(temp[0].startswith(x) for x in labels):
out.write((',').join(temp))
The program first collects only labels from csv1.csv
- note that you used readline
, where the program seems to expected all the lines from the file read at once. One way to do it is by using readlines
. The program also has to collect the lines from readlines
- here it stores them in a list named lines
. To collect the labels, the program loops through each line, splits it by a ,
and appends the first element to the array with labels, labels
.
In the second part, the program reads all the lines from csv2.csv
while also opening the file for writing the output, csv.out
. It processes the lines from csv2.csv
line by line while at the same time writing the target files to the output file.
To do that, the program again splits each line by ,
and looks if the label from csv2
is found in the labels
array. If it is, that line is written to csv_out.csv
.
Upvotes: 1
Reputation: 849
EDIT
labels = []
with open("csv1.csv", "r") as f:
f.readline()
for line in f:
labels.append((line.split(',')[0])
with open("csv2.csv", "r") as f:
f.readline()
for line in f:
if (line.split(",")[0]) in labels:
print (line)
I it so that labels only contains the first part of the string so ['aaa','bbb', etc]
Then you want to check if line.split(",")[0]
is in labels
Since you want to only match it based on the first column, you should use split and then get the first item from the split which is at index 0.
Upvotes: 0