Sam B.
Sam B.

Reputation: 3033

Pandas create a new column that states if file exists

Parsing data from a dataset where some images are not available so I want to create a new row exists so I can loop through the image names which are <id>.jpg to put there False or True.

Getting a unicode error

import pandas as pd
from pandas import Series
train = pd.read_csv('train.csv')

In [16]: train['exists'] = Series(str(os.path.isfile('training_images/' + train['id'] + '.jpg')))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-16-4ada5144d198> in <module>()
----> 1 train['exists'] = Series(str(os.path.isfile('training_images/' + train['id'] + '.jpg')))
/usr/lib/python2.7/genericpath.pyc in isfile(path)
     35     """Test whether a path is a regular file"""
     36     try:
---> 37         st = os.stat(path)
     38     except os.error:
     39         return False
TypeError: coercing to Unicode: need string or buffer, Series found

Upvotes: 2

Views: 1037

Answers (2)

jpp
jpp

Reputation: 164733

I recommend you use a vectorised solution, as below:

train['filename'] = 'training_images' + os.sep + train['id'] + '.jpg'
train['exists'] = train['filename'].map(os.path.isfile)

The result will be a Boolean pd.Series.

Upvotes: 3

Ken Syme
Ken Syme

Reputation: 3642

You can use apply to do this

train['exists'] = train['id'].apply(lambda x: os.path.isfile('training_images/' + x + '.jpg'))

Upvotes: 0

Related Questions