Reputation: 61
I am trying to iterate through the .jpg files in a directory to match with the names in a single column(image_name) of a .csv file.
import csv
import pandas as pd
import fnmatch
import os
imagenames=pd.read_csv('file.csv',header=0,usecols=['image_name'])
imnum=imagenames.shape[0]
for filename in os.listdir("directory"):
for i in range(imnum):
if imagenames.iloc[i] == filename:
print(imagenames.iloc[i])
I get an error message: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). Can anyone help me with the code?
Upvotes: 0
Views: 1020
Reputation: 210832
I'd do it this way:
import os
import glob
import pandas as pd
mask = r'/path/to/*.jpg'
jpgs = [os.path.split(f)[1] for f in glob.glob(mask)]
imagenames = pd.read_csv('file.csv',usecols=['image_name'],squeeze=True)
print(imagenames[imagenames.isin(jpgs)])
Upvotes: 1
Reputation: 2638
Although you don't include the line numbers, I assume the error is on the line imagenames.iloc[i] == filename
. You're getting this error because imagenames.iloc[i]
results in a Pandas Series object (representing a single column).
You could resolve this by replacing with imagenames.iloc[i]['image_name']
, but the resultant code would have 2 loops and be doing a ton of extra work.
Instead, I'd recommend refactoring with the following aim:
There are several ways to do this, and you don't mention how large these lists are. Let's assume they're relatively small, one way to approach the code which is more in line with Pandas vectorized approaches to data would be:
imagenames=pd.read_csv('file.csv',header=0,usecols=['image_name'])
files_in_dir = os.listdir("directory")
matches = imagenames[imagenames['image_name'].isin(files_in_dir)]
This isn't super efficient as .isin
is searching through a list of files, if the list is quite long, it could be potentially slow. You could consider using a set or other optimization if this is the case with your situation.
Upvotes: 1