cpwah
cpwah

Reputation: 141

Search in a csv file

I am writing a script that reads files from different directories; then I am using the file ID to search in the csv file. Here is the piece of code.

import os
import glob

searchfile = open("file.csv", "r")
train_file = open('train.csv','w')



listOfFiles = os.listdir("train")
for l in listOfFiles:
    dirList = glob.glob(('/train/%s/*.jpg') % (l))
    for d in dirList:
        id = d.split("/")
        id = id[-1].split(".")
        print id[0] # ID
        for line in searchfile:
            if id[0] in line: # search in csv file
                value= line.split(",") 
                value= value[1]+" "+ value[2] + "\n"
                train_file.write(id[0]+","+value) # write description
                break
searchfile.close()
train_file.close()

However, I am only able search couple of ID's from the csv file. Can someone point out my mistake. (please see comments for description)

EDITED

Instance of the text file.

192397335,carrello porta utensili 18x27 eh l 411 x p 572 x h 872 6 cassetti,,691.74,192397335.jpg

Upvotes: 1

Views: 93

Answers (2)

asongtoruin
asongtoruin

Reputation: 10359

Your issue is that when you do for line in searchfile: you're looping over a generator. The file doesn't reset for every id - for example, if the first id you pass to it is in line 50, the next id will start checking at line 51.

Instead, you can read your file to a list and loop over the list instead:

import os
import glob

with open("file.csv", "r") as s:
    search_file = s.readlines()

train_file = open('train.csv', 'w')

list_of_files = os.listdir("train")
for l in list_of_files:
    dirList = glob.glob(('/train/%s/*.jpg') % (l))
    for d in dirList:
        fname = os.path.splitext(os.path.basename(d))
        print fname[0] # ID
        for line in search_file:
            if fname[0] in line: # search in csv file
                value = line.split(",") 
                value = value[1]+" " + value[2] + "\n"
                train_file.write(fname[0]+","+value) # write description
                break

train_file.close()

I made a couple of other changes too - firstly, you shouldn't use the name id as it has meaning in Python - I picked fname instead to indicate the file name. Secondly, I canged your CamelCase names to lowercase, as is the convention. Finally, getting the file name and extension is neat and fairly consistent through a combination of os.path.splitext and os.path.basename.

Upvotes: 1

Serge Ballesta
Serge Ballesta

Reputation: 148890

You need to browse of lines of searchfile for each id found, but as you open the file outside of the loop, you only read each line once in the whole loop.

You should either load the whole file in a list and iterate the list of lines inside the loop, or if searchfile is really large and would hardly fit in memory reopen the file inside the loop:

List version:

with open("file.csv", "r") as searchfile:
    searchlines = searchfile.readlines()

train_file = open('train.csv','w')

listOfFiles = os.listdir("train")
for l in listOfFiles:
    dirList = glob.glob(('/train/%s/*.jpg') % (l))
    for d in dirList:
        id = d.split("/")
        id = id[-1].split(".")
        print id[0] # ID
        for line in searchlines:   # now a list so start at the beginning on each pass
            if id[0] in line: # search in csv file
                value= line.split(",") 
                value= value[1]+" "+ value[2] + "\n"
                train_file.write(id[0]+","+value) # write description
                break
train_file.close()

Re-open version

train_file = open('train.csv','w')

listOfFiles = os.listdir("train")
for l in listOfFiles:
    dirList = glob.glob(('/train/%s/*.jpg') % (l))
    for d in dirList:
        id = d.split("/")
        id = id[-1].split(".")
        print id[0] # ID
        searchfile = open("file.csv", "r")
        for line in searchfile:
            if id[0] in line: # search in csv file
                value= line.split(",") 
                value= value[1]+" "+ value[2] + "\n"
                train_file.write(id[0]+","+value) # write description
                break
        searchfile.close()
train_file.close()

Upvotes: 1

Related Questions