Reputation: 778
I have two list that contains the path of files
lst_A =['/home/data_A/test_AA_123.jpg',
'/home/data_A/test_AB_234.jpg',
'/home/data_A/test_BB_321.jpg',
'/home/data_A/test_BC_112.jpg',
]
lst_B =['/home/data_B/test_AA_222.jpg',
'/home/data_B/test_CC_444.jpg',
'/home/data_B/test_AB_555.jpg',
'/home/data_B/test_BC_777.jpg',
]
Based on the lst_A
, I want to sort the list B so that the first and second name of basename of two path in A and B should be same. In this case is test_xx
. So, the expected short list B is
lst_B =['/home/data_B/test_AA_222.jpg',
'/home/data_B/test_AB_555.jpg',
'/home/data_B/test_CC_444.jpg',
'/home/data_B/test_BC_777.jpg',
]
In additions, I want to indicate which position of two lists have first and second name are same in the basename (such as test_xx
), so the array indicator should be
array_same =[1,1,0,1]
How should I do it in python? I have tried the .sort() function but it returns unexpected result. Thanks
Update: This is my solution
import os
lst_A =['/home/data_A/test_AA_123.jpg',
'/home/data_A/test_AB_234.jpg',
'/home/data_A/test_BB_321.jpg',
'/home/data_A/test_BC_112.jpg',
]
lst_B =['/home/data_B/test_AA_222.jpg',
'/home/data_B/test_CC_444.jpg',
'/home/data_B/test_AB_555.jpg',
'/home/data_B/test_BC_777.jpg']
lst_B_sort=[]
same_array=[]
for ind_a, a_name in enumerate(lst_A):
for ind_b, b_name in enumerate(lst_B):
print (os.path.basename(b_name).split('_')[1])
if os.path.basename(b_name).split('_')[1] in os.path.basename(a_name):
lst_B_sort.append(b_name)
same_array.append(1)
print(lst_B_sort)
print(same_array)
Output: ['/home/data_B/test_AA_222.jpg', '/home/data_B/test_AB_555.jpg', '/home/data_B/test_BC_777.jpg']
[1, 1, 1]
Because I did not add the element that has not same name
Upvotes: 1
Views: 608
Reputation: 44465
We will discuss the issue with a SIMPLE technique followed by an APPLIED solution.
SIMPLE
We just focus on sorting the names given a key.
Given
Simple names and a key list:
lst_a = "AA AB BB BC EE".split()
lst_b = "AA DD CC AB BC".split()
key_list = [1, 1, 0, 1, 0]
Code
same = sorted(set(lst_a) & set(lst_b))
diff = sorted(set(lst_b) - set(same))
isame, idiff = iter(same), iter(diff)
[next(isame) if x else next(idiff) for x in key_list]
# ['AA', 'AB', 'CC', 'BC', 'DD']
lst_b
gets sorted according to elements shared with lst_a
first. Remnants are inserted as desired.
Details
This problem is mainly reduced to sorting the intersection of names from both lists. The intersection is a set of common elements called same
. The remnants are in a set called diff
. We sort same
and diff
and here's what they look like:
same
# ['AA', 'AB', 'BC']
diff
# ['CC', 'DD']
Now we just want to pull a value from either list, in order, according to the key. We start by iterating the key_list
. If 1
, pull from the isame
iterator. Otherwise, pull from idiff
.
Now that we have the basic technique, we can apply it to the more complicated path example.
APPLIED
Applying this idea to more complicated path-strings:
Given
import pathlib
lst_a = "foo/t_AA_a.jpg foo/t_AB_a.jpg foo/t_BB_a.jpg foo/t_BC_a.jpg foo/t_EE_a.jpg".split()
lst_b = "foo/t_AA_b.jpg foo/t_DD_b.jpg foo/t_CC_b.jpg foo/t_AB_b.jpg foo/t_BC_b.jpg".split()
key_list = [1, 1, 0, 1, 0]
# Helper
def get_name(s_path):
"""Return the shared 'name' from a string path.
Examples
--------
>>> get_name("foo/test_xx_a.jpg")
'test_xx'
"""
return pathlib.Path(s_path).stem.rsplit("_", maxsplit=1)[0]
Code
Map the names to paths:
name_path_a = {get_name(p): p for p in lst_a}
name_path_b = {get_name(p): p for p in lst_b}
Names are in dict keys, so directly substitute sets with dict keys:
same = sorted(name_path_a.keys() & name_path_b.keys())
diff = sorted(name_path_b.keys() - set(same))
isame, idiff = iter(same), iter(diff)
Get the paths via names pulled from iterators:
[name_path_b[next(isame)] if x else name_path_b[next(idiff)] for x in key_list]
Output
['foo/t_AA_b.jpg',
'foo/t_AB_b.jpg',
'foo/t_CC_b.jpg',
'foo/t_BC_b.jpg',
'foo/t_DD_b.jpg']
Upvotes: 1
Reputation: 780724
Loop through lst_A
, get the filename prefix, then append the element from lst_B
with the same prefix to the result list.
Create a set of all the elements from lst_B
, and when you add a path to the result, remove it from the set. Then at the end you can go through this set, filling in the blank spaces in the result where there were no matches.
lst_A =['/home/data_A/test_AA_123.jpg',
'/home/data_A/test_AB_234.jpg',
'/home/data_A/test_BB_321.jpg',
'/home/data_A/test_BC_112.jpg',
]
lst_B =['/home/data_B/test_AA_222.jpg',
'/home/data_B/test_CC_444.jpg',
'/home/data_B/test_AB_555.jpg',
'/home/data_B/test_BC_777.jpg',
]
new_lst_B = []
same_array = []
set_B = set(lst_B)
for fn in lst_A:
prefix = "_".join(os.path.basename(fn).split('_')[:-1])+'_' # This gets test_AA_
try:
found_B = next(x for x in lst_B if os.path.basename(x).startswith(prefix))
new_lst_b.append(found_B)
same_array.append(1)
set_B.remove(found_B)
except StopIteration: # No match found
new_lst_b.append(None) # Placeholder to fill in
same_array.append(0)
for missed in set_B:
index = new_lst_B.index(None)
new_lst_B[index] = missed
Upvotes: 1