Reputation: 507
This is somehow related to this question.
I have to lists of URLs. The first list is:
http://example.com/1/1.jpg
http://example.com/2/2.jpg
http://example.com/3/3.jpg
...
http://example.com/45000/45000.jpg
The second list is a subset of the first one: its made of real URLs, the ones that are not broken links.
http://example.com/12/12.jpg
http://example.com/23/23.jpg
http://example.com/34/34.jpg
...
I would like to know how to sort it in a way where I can have something like this
...
None
http://example.com/12/12.jpg
None
None
...
None
http://example.com/23/23.jpg
None
...
The point is to have a sorted list where I can have the real URLs at the right position in the final csv file.
I've tried this reading the first list and try to match with the item in the second list, but I'm failing in using both the double loop and the matching pattern.
I read the lists from files, using open()
: this means I have to deal with line breaks (it seems to be a issue).
Upvotes: 2
Views: 85
Reputation: 1183
Let us say you first list (superset) is l1 and second (subset) is l2.
l3 = []
for li in l1:
if li in l2:
l3.append(li)
else:
l3.append(None)
This will do. I am really not an expert in python so there might be better ways, but this is what I will use.
As per your comment. Let us say you have two files. superset.txt (with all urls) and subset.txt (with some urls).
superset.txt
http://example.com/1/1.jpg
http://example.com/2/2.jpg
http://example.com/12/12.jpg
http://example.com/3/3.jpg
http://example.com/23/23.jpg
http://example.com/3/3.jpg
http://example.com/34/34.jpg
http://example.com/45000/45000.jpg
subset.txt
http://example.com/12/12.jpg
http://example.com/23/23.jpg
http://example.com/34/34.jpg
Below script will read them (from the same folder) and create the required list.
f1 = open("superset.txt","r")
f2 = open("subset.txt","r")
l1 = list(f1)
l2 = list(f2)
l3 = []
for li in l1:
if li in l2:
l3.append(li.strip())
else:
l3.append(None)
print l3 # or you can save this to a file.
Result
[None, None, 'http://example.com/12/12.jpg', None, 'http://example.com/23/23.jpg', None, None, None]
Upvotes: 1
Reputation: 10408
This should work:
list1 = ['http://example.com/1/1.jpg','http://example.com/2/2.jpg','http://example.com/3/3.jpg']
list2 = ['http://example.com/5/11.jpg','http://example.com/20/20.jpg','http://example.com/9/9.jpg','http://example.com/12/12.jpg']
length_of_list = max(set([int(i) for i in ''.join(list1+list2).split('/') if i.isdigit()]))
final_list = [None]*length_of_list
for i in list1+list2:
position = [int(x) for x in [s for s in i.split("/")] if x.isdigit()][0]
final_list[position-1] = i
for x in final_list:
print x
>>
http://example.com/1/1.jpg
http://example.com/2/2.jpg
http://example.com/3/3.jpg
None
http://example.com/5/11.jpg
None
None
None
http://example.com/9/9.jpg
None
None
http://example.com/12/12.jpg
None
None
None
None
None
None
None
http://example.com/20/20.jpg
Upvotes: 0
Reputation: 52111
You can use a simple list-comp along with ternary condition like this
>>> orig = ['http://example.com/1/1.jpg','http://example.com/2/2.jpg','http://example.com/3/3.jpg']
>>> real = ['http://example.com/1/1.jpg']
>>> [i if i in real else None for i in orig]
['http://example.com/1/1.jpg', None, None]
It would be better if the real
list is stored into a set as the processing will be faster. In that case, the code would be
>>> orig = ['http://example.com/1/1.jpg','http://example.com/2/2.jpg','http://example.com/3/3.jpg']
>>> real = ['http://example.com/1/1.jpg']
>>> real_set = set(real)
>>> [i if i in real_set else None for i in orig]
[u'http://example.com/1/1.jpg', None, None]
Thanks to mata and Cuadue for the second version using sets. Check their comments below.
Upvotes: 5