Valderas
Valderas

Reputation: 45

Create a column with values of a list and depending on another column

img_dir = [img_pathA.1.jpg, img_pathA.2.jpg, img_pathA.3.jpg, img_pathB.1.jpg, img_pathB.2.jpg, .... img_pathZ.3.jpg]

df:

ID
A
B
C
..
Z

As you can see, every image path in the list contains in its filename the ID who belongs to.

I would like to add all the image paths for every ID in the dataframe. The goal is to get something like this:

final_df:

ID img_path
A img_pathA.1.jpg
A img_pathA.2.jpg
A img_pathA.3.jpg
B img_pathB.1.jpg
B img_pathB.2.jpg
.. ............
Z img_pathZ.3.jpg

The numbers of images per ID is not fixed (usually 2-3 images per ID), so I have thought that I could replicate the entire dataframe maybe 3 times, do the assignment for every row and after that, delete the rows that doesn't have a path ("No path").

I have tried the following code:

df['img_path'] = "No path"
df = pd.concat([df]*3, ignore_index=True)

for ID in df['ID']:
    for path in img_dir:
        if ID in path:
            df.loc[(df['ID'] == ID), 'img_path'] = path

But I get something like this. I think that it's because the ID gets replicated too and the column seems to store the last image for every ID:

ID img_path
A img_pathA.3.jpg
A img_pathA.3.jpg
A img_pathA.3.jpg
B img_pathB.2.jpg
B img_pathB.2.jpg
.. ............
Z img_pathZ.3.jpg

Any idea of how could I solve or improve this?

Thank you in advance.

Upvotes: 1

Views: 656

Answers (1)

Shubham Sharma
Shubham Sharma

Reputation: 71687

Create a series from the img_dir list then extract the ID from the corresponding paths and set the extracted ID as the index of the series, then join the dataframe with this series on the column ID

s = pd.Series(img_dir)
s.index = s.str.extract(fr"({'|'.join(df['ID'])})", expand=False)

df.join(s.rename('img_path'), on='ID')

  ID          img_path
0  A   img_pathA.1.jpg
0  A   img_pathA.2.jpg
0  A   img_pathA.3.jpg
1  B   img_pathB.1.jpg
1  B   img_pathB.2.jpg
...
3  Z   img_pathZ.3.jpg

Upvotes: 2

Related Questions