Reputation: 20637
I have a text file of URLs, about 14000. Below is a couple of examples:
http://www.domainname.com/pagename?CONTENT_ITEM_ID=100¶m2=123
http://www.domainname.com/images?IMAGE_ID=10
http://www.domainname.com/pagename?CONTENT_ITEM_ID=101¶m2=123
http://www.domainname.com/images?IMAGE_ID=11
http://www.domainname.com/pagename?CONTENT_ITEM_ID=102¶m2=123
I have loaded the text file into a Python list and I am trying to get all the URLs with CONTENT_ITEM_ID separated off into a list of their own. What would be the best way to do this in Python?
Cheers
Upvotes: 5
Views: 5996
Reputation: 241790
I liked @bobince's answer (+1), but will up the ante.
Since you have a rather large starting set, you may wish to avoid loading the entire list into memory. Unless you need the whole list for something else, you could use a Python generator expression to perform the same task by building up the filtered list item by item as they're requested:
for filtered_url in (line for line in file if 'CONTENT_ITEM_ID' in line):
do_something_with_filtered_url(filtered_url)
Upvotes: 6
Reputation: 89171
For completeness; You can also use ifilter
. It is like filter, but doesn't build up a list.
from itertools import ifilter
for line in ifilter(lambda line: 'CONTENT_ITEM_ID' in line, urls):
do_something(line)
Upvotes: 5
Reputation: 536399
Here's another alternative to Graeme's, using the newer list comprehension syntax:
list2= [line for line in file if 'CONTENT_ITEM_ID' in line]
Which you prefer is a matter of taste!
Upvotes: 21
Reputation: 57248
list2 = filter( lambda x: x.find( 'CONTENT_ITEM_ID ') != -1, list1 )
The filter calls the function (first parameter) on each element of list1 (second parameter). If the function returns true (non-zero), the element is copied to the output list.
The lambda basically creates a temporary unnamed function. This is just to avoid having to create a function and then pass it, like this:
function look_for_content_item_id( elem ):
if elem.find( 'CONTENT_ITEM_ID') == -1:
return 0
return 1
list2 = filter( look_for_content_item_id, list1 )
Upvotes: 5