Reputation: 45
img_dir = [img_pathA.1.jpg, img_pathA.2.jpg, img_pathA.3.jpg, img_pathB.1.jpg, img_pathB.2.jpg, .... img_pathZ.3.jpg]
ID
column:df
:
ID |
---|
A |
B |
C |
.. |
Z |
As you can see, every image path in the list contains in its filename the ID who belongs to.
I would like to add all the image paths for every ID in the dataframe. The goal is to get something like this:
final_df
:
ID | img_path |
---|---|
A | img_pathA.1.jpg |
A | img_pathA.2.jpg |
A | img_pathA.3.jpg |
B | img_pathB.1.jpg |
B | img_pathB.2.jpg |
.. | ............ |
Z | img_pathZ.3.jpg |
The numbers of images per ID is not fixed (usually 2-3 images per ID), so I have thought that I could replicate the entire dataframe maybe 3 times, do the assignment for every row and after that, delete the rows that doesn't have a path ("No path").
I have tried the following code:
df['img_path'] = "No path"
df = pd.concat([df]*3, ignore_index=True)
for ID in df['ID']:
for path in img_dir:
if ID in path:
df.loc[(df['ID'] == ID), 'img_path'] = path
But I get something like this. I think that it's because the ID gets replicated too and the column seems to store the last image for every ID:
ID | img_path |
---|---|
A | img_pathA.3.jpg |
A | img_pathA.3.jpg |
A | img_pathA.3.jpg |
B | img_pathB.2.jpg |
B | img_pathB.2.jpg |
.. | ............ |
Z | img_pathZ.3.jpg |
Any idea of how could I solve or improve this?
Thank you in advance.
Upvotes: 1
Views: 656
Reputation: 71687
Create a series from the img_dir
list then extract
the ID
from the corresponding paths and set the extracted ID
as the index of the series, then join
the dataframe with this series on the column ID
s = pd.Series(img_dir)
s.index = s.str.extract(fr"({'|'.join(df['ID'])})", expand=False)
df.join(s.rename('img_path'), on='ID')
ID img_path
0 A img_pathA.1.jpg
0 A img_pathA.2.jpg
0 A img_pathA.3.jpg
1 B img_pathB.1.jpg
1 B img_pathB.2.jpg
...
3 Z img_pathZ.3.jpg
Upvotes: 2