Reputation: 3033
Parsing data from a dataset where some images are not available so I want to create a new row exists
so I can loop through the image names which are <id>.jpg
to put there False or True.
Getting a unicode error
import pandas as pd
from pandas import Series
train = pd.read_csv('train.csv')
In [16]: train['exists'] = Series(str(os.path.isfile('training_images/' + train['id'] + '.jpg')))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-16-4ada5144d198> in <module>()
----> 1 train['exists'] = Series(str(os.path.isfile('training_images/' + train['id'] + '.jpg')))
/usr/lib/python2.7/genericpath.pyc in isfile(path)
35 """Test whether a path is a regular file"""
36 try:
---> 37 st = os.stat(path)
38 except os.error:
39 return False
TypeError: coercing to Unicode: need string or buffer, Series found
Upvotes: 2
Views: 1037
Reputation: 164733
I recommend you use a vectorised solution, as below:
train['filename'] = 'training_images' + os.sep + train['id'] + '.jpg'
train['exists'] = train['filename'].map(os.path.isfile)
The result will be a Boolean pd.Series
.
Upvotes: 3
Reputation: 3642
You can use apply to do this
train['exists'] = train['id'].apply(lambda x: os.path.isfile('training_images/' + x + '.jpg'))
Upvotes: 0