Reputation: 31
I have two lists, potentially of different lengths. Each list contains filenames in the form of strings. I don't have control over the names, but I'm assured that the name structure won't change. It will always be something like name1_name2_number1_+(or-)number2.jpg
Number1 is the substring I want to match between the two lists. If the filename in one list contains the same number1 as a filename in the other list, I want to append both those file names to a third list. I have a simple function that will get the number1's in a given list, e.g:
>>>list1 = ['serentity01_20malcolm_200_+3.jpg','inara03_kaley40_8000_-1.jpg']
>>>def GetNum(imgStrings):
... ss = []
... for b in imgStrings:
... ss.append([w for w in b.split('_') if w.isdigit()])
... #flatten zee list of lists because it is ugly.
... return [val for subl in ss for val in subl]
>>>GetNum(list1)
['200', '800]
So, for
>>>list1 = ['serentity01_20malcolm_200_+3.jpg','inara03_kaley40_8000_-1.jpg']
>>>list2 = ['inara03_summer40_8000_-2.jpg', 'book23_42jayne_400_+2.jpg', 'summer53_21simon_300_-1.jpg']
>>>awesomesauceSubstringMatcher(list1, list2)
['inara03_kaley40_8000_-1.jpg', 'inara03_summer40_8000_-2.jpg']
I feel I should be able to do it with my GetNum function and some list comprehension, but the niftiness that is the whole '[blah for blah in ...]' syntax is new to me, and I can't quite wrap my head around this one. Thoughts? Suggestions? Death threats? Thanks to all helpful responses in advance, and a thousand apologies if my googlefu has failed me in trying to find a similar question/answer.
EDIT I just figured this solution out:
[str for str in list1+list2 if any(subs in str for subs in GetNum(list1)) and any(subs in str for subs in GetNum(list2))]
I know it's long and ugly, but I really wanted to prove to myself that it could be done with list comprehension. Thanks for all the helpful responses!
Upvotes: 2
Views: 2009
Reputation: 18312
list1 = ['serentity01_20malcolm_200_+3.jpg','inara03_kaley40_8000_-1.jpg']
list2 = ['inara03_summer40_8000_-2.jpg', 'book23_42jayne_400_+2.jpg', 'summer53_21simon_300_-1.jpg']
def getNum(image_name_list):
for s in image_name_list:
s = s.split('_')[2]
if s.isdigit():
yield s
else:
yield None
def getMatchingIndex(list1, list2):
other_list = list(getNum(list2))
for (i, num) in enumerate(getNum(list1)):
if not num:
continue
for (j, other_num) in enumerate(getNum(list2)):
if (num == other_num):
yield (i, j)
for i1, i2 in getMatchingIndex(list1, list2):
print list1[i1], list2[i2]
Since we only need to compare one item at a time to every time in the second list, I used a generator in getNum to save memory. Since a number might match more than once, I keep checking through each item.
Upvotes: 2
Reputation: 2641
My bit of the solution using map,reduce, filter and list flattening using sum:-
l=['a_b_1_2','b_c_2_3']
s=['c_d_3_4','d_e_1_4']
a=map(lambda y: map(lambda z: [y,z] if y[2] == z[2] else '', map(lambda v:v.split('_'), s)),map(lambda x:x.split('_'),l))
map(lambda x: '_'.join(x), sum(filter(lambda qq: qq is not '',sum(a,[]))))
Showing it on the actual dataset:
>>> list1 = ['serentity01_20malcolm_200_+3.jpg','inara03_kaley40_8000_-1.jpg']
>>> list2 = ['inara03_summer40_8000_-2.jpg', 'book23_42jayne_400_+2.jpg', 'summer53_21simon_300_-1.jpg']
>>> a=map(lambda y: map(lambda z: [y,z] if y[2] == z[2] else '', map(lambda v:v.split('_'), list2)),map(lambda x:x.split('_'),list1))
>>> a
[['', '', ''], [[['inara03', 'kaley40', '8000', '-1.jpg'], ['inara03', 'summer40', '8000', '-2.jpg']], '', '']]
>>> sum(filter(lambda qq: qq is not '',sum(a,[])),[])
[['inara03', 'kaley40', '8000', '-1.jpg'], ['inara03', 'summer40', '8000', '-2.jpg']]
>>> map(lambda x: '_'.join(x), sum(filter(lambda qq: qq is not '',sum(a,[])),[]))
['inara03_kaley40_8000_-1.jpg', 'inara03_summer40_8000_-2.jpg'] #This is the output you want.
Upvotes: 0
Reputation: 113
This returns a list of all the matching values in both lists. For example if there are matches for the number 8000 and 300, it will return a list full of list for each number possible, and then populate the lists with only the matches.
list1 = ['serentity01_20malcolm_200_+3.jpg','inara03_kaley40_8000_-1.jpg',
'inara03_34simon_300_+1.jpg']
list2 = ['inara03_summer40_8000_-2.jpg', 'book23_42jayne_400_+2.jpg',
'summer53_21simon_300_-1.jpg']
def GetNum(imgStrings):
ss = []
for b in imgStrings:
ss.append([w for w in b.split('_') if w.isdigit()])
#flatten zee list of lists because it is ugly.
return [val for subl in ss for val in subl]
print GetNum(list1)
def addToThird(input1, input2):
numlist1 = GetNum(input1)
numlist2 = GetNum(input2)
numgroups = set(numlist1 + numlist2)
numgroups = list(numgroups)
collectionsList = []
for i in numgroups:
collectionsList.append([])
for item1 in numlist1:
for item2 in numlist2:
if item1 == item2:
print item1, item2
goindex = numgroups.index(item1)
collectionsList[goindex].append(input1[numlist1.index(item1)])
collectionsList[goindex].append(input1[numlist2.index(item2)])
return collectionsList
print addToThird(list1, list2)
output:
['200', '8000', '300']
8000 8000
300 300
[['inara03_34simon_300_+1.jpg', 'inara03_34simon_300_+1.jpg'], [],
'inara03_kaley40_8000_-1.jpg', 'serentity01_20malcolm_200_+3.jpg'], []]
Upvotes: 0
Reputation: 239
Parse the strings into data you can actually sift through. Things will be much easier then.
def process(filename):
splitup = filename.rstrip('.jpg').split('_')
keys = ["name1", "name2", "number1", "number2"]
r = dict(zip(keys, splitup))
r["filename"] = filename
return r
list1 = ['serentity01_20malcolm_200_+3.jpg','inara03_kaley40_8000_-1.jpg']
list2 = ['inara03_summer40_8000_-2.jpg', 'book23_42jayne_400_+2.jpg', 'summer53_21simon_300_-1.jpg']
plist1 = [process(f) for f in list1]
plist2 = [process(f) for f in list2]
nlist1 = [i['number1'] for i in plist1]
nlist2 = [i['number1'] for i in plist2]
ilist1 = [i for i in plist1 if i['number1'] in nlist2]
ilist2 = [i for i in plist2 if i['number1'] in nlist1]
intersection = set([i["filename"] for i in ilist1 + ilist2])
for i in intersection:
print i
Edit: shoot, I see now you want intersections from both lists.
Upvotes: 0
Reputation: 1222
I would build a dictionary for both lists where the key is the number from the filename and the value is the filename itself. Then "intersect" the two sets of keys, the resulting common keys could then be used to build up the third list, e.g:
def List2Dic(List):
return dict(map(lambda x: [ x.split("_")[2], x], List))
list1 = ['serentity01_20malcolm_200_+3.jpg','inara03_kaley40_8000_-1.jpg']
list2 = ['inara03_summer40_8000_-2.jpg', 'book23_42jayne_400_+2.jpg', 'summer53_21simon_300_-1.jpg']
d1 = List2Dic(list1)
d2 = List2Dic(list2)
for x in set(d1) & set(d2):
print d1[x], d2[x]
Upvotes: 0
Reputation: 3375
Untested, but the logic should be correct:
list1 = ['serentity01_20malcolm_200_+3.jpg','inara03_kaley40_8000_-1.jpg']
list2 = ['inara03_summer40_8000_-2.jpg', 'book23_42jayne_400_+2.jpg', 'summer53_21simon_300_-1.jpg']
list3 = []
seenInList1Dict = {}
for element in list1:
splitelem = element.split('_')
seenInList1Dict[splitelem[2]] = 1
for element in list2:
splitelem = element.split('_')
if splitelem[2] in seenInList1Dict:
list3.append(element)
I didn't use your GetNum
because it unnecessarily complicates things IMO. I find it's easier to just dump things into a dictionary if you want to quickly find/compare the existence of them later. Also if you want the number you just need to do a split
on the filenames and grab the value you want from the appropriate index.
Upvotes: 0