Trying to match a filename in a directory and an element in a .csv file in python using pandas

Question

I am trying to iterate through the .jpg files in a directory to match with the names in a single column(image_name) of a .csv file.

import csv
import pandas as pd
import fnmatch
import os


imagenames=pd.read_csv('file.csv',header=0,usecols=['image_name'])
imnum=imagenames.shape[0]

for filename in os.listdir("directory"):
    for i in range(imnum):
        if imagenames.iloc[i] == filename:
            print(imagenames.iloc[i])

I get an error message: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). Can anyone help me with the code?

Peter Mularien · Accepted Answer

Although you don't include the line numbers, I assume the error is on the line imagenames.iloc[i] == filename. You're getting this error because imagenames.iloc[i] results in a Pandas Series object (representing a single column).

You could resolve this by replacing with imagenames.iloc[i]['image_name'], but the resultant code would have 2 loops and be doing a ton of extra work.

Instead, I'd recommend refactoring with the following aim:

You have a list of filenames from the CSV
You have a list of filenames from the directory listing
You want the intersection of these two lists (i.e. filenames which appear in both)

There are several ways to do this, and you don't mention how large these lists are. Let's assume they're relatively small, one way to approach the code which is more in line with Pandas vectorized approaches to data would be:

imagenames=pd.read_csv('file.csv',header=0,usecols=['image_name'])
files_in_dir = os.listdir("directory")
matches = imagenames[imagenames['image_name'].isin(files_in_dir)]

This isn't super efficient as .isin is searching through a list of files, if the list is quite long, it could be potentially slow. You could consider using a set or other optimization if this is the case with your situation.

Trying to match a filename in a directory and an element in a .csv file in python using pandas

Answers (2)

Related Questions