user_01
user_01

Reputation: 457

Searching rows of a file in another file and printing appropriate rows in python

I have a csv file like this: (no headers)

aaa,1,2,3,4,5  
bbb,2,3,4,5,6
ccc,3,5,7,8,5
ddd,4,6,5,8,9

I want to search another csv file: (no headers)

bbb,1,2,3,4,5,,6,4,7
kkk,2,3,4,5,6,5,4,5,6
ccc,3,4,5,6,8,9,6,9,6
aaa,1,2,3,4,6,6,4,6,4
sss,1,2,3,4,5,3,5,3,5

and print rows in the second file(based on matching of the first columns) that exist in the first file. So results will be:

bbb,1,2,3,4,5,,6,4,7
ccc,3,4,5,6,8,9,6,9,6
aaa,1,2,3,4,6,6,4,6,4 

I have following code, but it does not print anything:

labels = []
with open("csv1.csv", "r") as f:

    f.readline()
    for line in f:
        labels.append((line.strip("\n")))

with open("csv2.csv", "r") as f:

    f.readline()
    for line in f:
        if (line.split(",")[1]) in labels:
            print (line)

If possible, could you tell me how to do this, please ? What is wrong with my code ? Thanks in advance !

Upvotes: 1

Views: 57

Answers (2)

atru
atru

Reputation: 4744

This is one solution, although you may also look into csv-specific tools and pandas as suggested:

labels = []
with open("csv1.csv", "r") as f:
    lines = f.readlines()
    for line in lines:
        labels.append(line.split(',')[0])

with open("csv2.csv", "r") as f:
    lines = f.readlines()

with open("csv_out.csv", "w") as out:
    for line in lines:
        temp = line.split(',')
        if any(temp[0].startswith(x) for x in labels):
            out.write((',').join(temp))

The program first collects only labels from csv1.csv - note that you used readline, where the program seems to expected all the lines from the file read at once. One way to do it is by using readlines. The program also has to collect the lines from readlines - here it stores them in a list named lines. To collect the labels, the program loops through each line, splits it by a , and appends the first element to the array with labels, labels.

In the second part, the program reads all the lines from csv2.csv while also opening the file for writing the output, csv.out. It processes the lines from csv2.csv line by line while at the same time writing the target files to the output file.

To do that, the program again splits each line by , and looks if the label from csv2 is found in the labels array. If it is, that line is written to csv_out.csv.

Upvotes: 1

Samantha
Samantha

Reputation: 849

  • Try using pandas, its a very effective way to read csv files into a data structure called dataframes.

EDIT

labels = []
with open("csv1.csv", "r") as f:

    f.readline()
    for line in f:
        labels.append((line.split(',')[0])

with open("csv2.csv", "r") as f:

    f.readline()
    for line in f:
        if (line.split(",")[0]) in labels:
            print (line)

I it so that labels only contains the first part of the string so ['aaa','bbb', etc]

Then you want to check if line.split(",")[0] is in labels

Since you want to only match it based on the first column, you should use split and then get the first item from the split which is at index 0.

Upvotes: 0

Related Questions