Create a column with values of a list and depending on another column

Question

I have a list of paths of different images:

img_dir = [img_pathA.1.jpg, img_pathA.2.jpg, img_pathA.3.jpg, img_pathB.1.jpg, img_pathB.2.jpg, .... img_pathZ.3.jpg]

And a dataframe with an ID column:

df:

ID
A
B
C
..
Z

As you can see, every image path in the list contains in its filename the ID who belongs to.

I would like to add all the image paths for every ID in the dataframe. The goal is to get something like this:

final_df:

ID	img_path
A	img_pathA.1.jpg
A	img_pathA.2.jpg
A	img_pathA.3.jpg
B	img_pathB.1.jpg
B	img_pathB.2.jpg
..	............
Z	img_pathZ.3.jpg

The numbers of images per ID is not fixed (usually 2-3 images per ID), so I have thought that I could replicate the entire dataframe maybe 3 times, do the assignment for every row and after that, delete the rows that doesn't have a path ("No path").

I have tried the following code:

df['img_path'] = "No path"
df = pd.concat([df]*3, ignore_index=True)

for ID in df['ID']:
    for path in img_dir:
        if ID in path:
            df.loc[(df['ID'] == ID), 'img_path'] = path

But I get something like this. I think that it's because the ID gets replicated too and the column seems to store the last image for every ID:

ID	img_path
A	img_pathA.3.jpg
A	img_pathA.3.jpg
A	img_pathA.3.jpg
B	img_pathB.2.jpg
B	img_pathB.2.jpg
..	............
Z	img_pathZ.3.jpg

Any idea of how could I solve or improve this?

Thank you in advance.

Shubham Sharma · Accepted Answer

Create a series from the img_dir list then extract the ID from the corresponding paths and set the extracted ID as the index of the series, then join the dataframe with this series on the column ID

s = pd.Series(img_dir)
s.index = s.str.extract(fr"({'|'.join(df['ID'])})", expand=False)

df.join(s.rename('img_path'), on='ID')

  ID          img_path
0  A   img_pathA.1.jpg
0  A   img_pathA.2.jpg
0  A   img_pathA.3.jpg
1  B   img_pathB.1.jpg
1  B   img_pathB.2.jpg
...
3  Z   img_pathZ.3.jpg

Create a column with values of a list and depending on another column

Answers (1)

Related Questions