David
David

Reputation: 8298

splitting column of paths to two columns based on the directory and the name

I have the following Data-Frame:

    image_path
0      /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2/171107_Plate_2_1_w1cf-Brightfield_s154-0001procstk.tif
1      /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2/171107_Plate_2_1_w1cf-Brightfield_s153-0001procstk.tif
2      /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2/171107_Plate_2_1_w1cf-Brightfield_s161-0001procstk.tif
3      /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2/171107_Plate_2_1_w1cf-Brightfield_s160-0001procstk.tif
4      /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2/171107_Plate_2_1_w1cf-Brightfield_s155-0001procstk.tif
5      /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2/171107_Plate_2_1_w1cf-Brightfield_s158-0001procstk.tif
6      /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2/171107_Plate_2_1_w1cf-Brightfield_s157-0001procstk.tif
7      /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2/171107_Plate_2_1_w1cf-Brightfield_s159-0001procstk.tif
8      /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2/171107_Plate_2_1_w1cf-Brightfield_s156-0001procstk.tif
9   /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2_DL/171107_Plate_2_1_w1cf-Brightfield_s158-learning_01.tif
10  /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2_DL/171107_Plate_2_1_w1cf-Brightfield_s157-learning_01.tif
11  /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2_DL/171107_Plate_2_1_w1cf-Brightfield_s159-learning_01.tif
12  /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2_DL/171107_Plate_2_1_w1cf-Brightfield_s156-learning_01.tif
13  /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2_DL/171107_Plate_2_1_w1cf-Brightfield_s161-learning_01.tif
14  /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2_DL/171107_Plate_2_1_w1cf-Brightfield_s160-learning_01.tif

I wish to split the data into 2 columns representing the RAW image and its corresponding segmentation maps.

There are 2 issue here:

  1. there are more RAW images then segmentation maps, so I want to take only the ones that appear in both directories. The way to compare 2 files is based on ...Brightfield_s154..., what comes after sXXX is not relevant.
  2. The RAW images are in 07_CSWAT_plate2 dir while the segmenataion maps are in 07_CSWAT_plate2_DL

I was able to separate create column of source dir using:

all_files["source"] = all_files['image_path'].map(lambda x: x.split("/")[-2])

Then I separated into 2 groups based on the directory:

all_files.groupby("source")

And I was stuck on the way to create a Data-Frame with 2 column which correspond to the same image file where the first column is the RAW images path and the second is the segmentation images path.

The expected output is:

    raw                                                                                                                                       seg
0  /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2/171107_Plate_2_1_w1cf-Brightfield_s156-0001procstk.tif  /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2_DL/171107_Plate_2_1_w1cf-Brightfield_s156-learning_01.tif
1  /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2/171107_Plate_2_1_w1cf-Brightfield_s157-0001procstk.tif  /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2_DL/171107_Plate_2_1_w1cf-Brightfield_s157-learning_01.tif
2  /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2/171107_Plate_2_1_w1cf-Brightfield_s158-0001procstk.tif  /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2_DL/171107_Plate_2_1_w1cf-Brightfield_s158-learning_01.tif
3  /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2/171107_Plate_2_1_w1cf-Brightfield_s159-0001procstk.tif  /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2_DL/171107_Plate_2_1_w1cf-Brightfield_s159-learning_01.tif
4  /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2/171107_Plate_2_1_w1cf-Brightfield_s160-0001procstk.tif  /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2_DL/171107_Plate_2_1_w1cf-Brightfield_s160-learning_01.tif
5  /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2/171107_Plate_2_1_w1cf-Brightfield_s161-0001procstk.tif  /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2_DL/171107_Plate_2_1_w1cf-Brightfield_s161-learning_01.tif

Would appreciate some help

Upvotes: 1

Views: 82

Answers (1)

Shubham Sharma
Shubham Sharma

Reputation: 71689

  • Use Series.str.extract to extract the key from the column image_path on the basis of which you want to compare the file paths.

  • Use boolean masking with Series.str.contains to filter the corresponding raw and segment file paths.

  • Use DataFrame.merge to merge these raw and segment paths based on the extracted key.

k = df['image_path'].str.extract(r'(Brightfield_s\d+)', expand=False)

r = df[df['image_path'].str.contains('07_CSWAT_plate2/')]
s = df[df['image_path'].str.contains('07_CSWAT_plate2_DL/')]

d = r.assign(key=k).merge(s.assign(key=k), on='key')\
                   .drop('key', 1).set_axis(['raw', 'seg'], 1)

Result:

    raw                                                                                                                                    seg
0  /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2/171107_Plate_2_1_w1cf-Brightfield_s161-0001procstk.tif  /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2_DL/171107_Plate_2_1_w1cf-Brightfield_s161-learning_01.tif
1  /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2/171107_Plate_2_1_w1cf-Brightfield_s160-0001procstk.tif  /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2_DL/171107_Plate_2_1_w1cf-Brightfield_s160-learning_01.tif
2  /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2/171107_Plate_2_1_w1cf-Brightfield_s158-0001procstk.tif  /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2_DL/171107_Plate_2_1_w1cf-Brightfield_s158-learning_01.tif
3  /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2/171107_Plate_2_1_w1cf-Brightfield_s157-0001procstk.tif  /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2_DL/171107_Plate_2_1_w1cf-Brightfield_s157-learning_01.tif
4  /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2/171107_Plate_2_1_w1cf-Brightfield_s159-0001procstk.tif  /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2_DL/171107_Plate_2_1_w1cf-Brightfield_s159-learning_01.tif
5  /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2/171107_Plate_2_1_w1cf-Brightfield_s156-0001procstk.tif  /Users/davidsriker/Desktop/ThesisWIZ/Segmentation/SampleImages/07_CSWAT_plate2_DL/171107_Plate_2_1_w1cf-Brightfield_s156-learning_01.tif

Upvotes: 2

Related Questions