Radamand
Radamand

Reputation: 185

Selecting items matching a specific condition from a list

I have a dict that contains keys like this:

237870a/
237870b/
237870c/
115460a/
115460b/
115460c/
115460d/
229898/
212365a/
109678/

I need to iterate over this list of keys and pull out certain items:

  1. For items that share the same numeric prefix and have an alphabetic character at the end, I need the item with the highest character, i.e. in this case 237870c, 115460d, and 212365a.

  2. Any other item with a unique number without a trailing alphabetic character, i.e. 229898 & 109678

So, my result should be:

237870c/
115460d/
229898/
212365a/
109678/

sorry I don't have any code to show as i'm really not sure how to even start writing this...

Upvotes: 1

Views: 75

Answers (1)

Jonas Schäfer
Jonas Schäfer

Reputation: 20718

First of all, this has nothing to do with dictionaries: as you said yourself, you’re operating on a list of keys. The origin of the list isn’t important.

You can use itertools.groupby for this, with a clever key function. For itertools.groupby to work properly, we first need to sort the keys:

keys = sorted(keys)

Then we have to think about a key function. This must be designed in a way so that only the numeric prefix is used to group:

def keyfunc(item):
    if item[-1].isalpha():
         return item[:-1]
    return item

This will strip the last character if it is alphabetic, so that itertools.groupby won’t take it into account when grouping. We’ll then take the last element of the grouped items, which will be the one with the highest alphabetic character.

Now we can apply groupby to obtain a list of items as you need:

items = [sorted(subitems)[-1] 
         for _, subitems 
         in itertools.groupby(keys, keyfunc)]

See it in action:

>>> # output formatting and indentation by me
... 
>>> keys
['237870a/', '237870b/', '237870c/', '115460a/', 
 '115460b/', '115460c/', '115460d/', '229898/', 
 '212365a/', '109678/']
>>> def keyfunc(item):
...   if item[-1].isalpha():
...     return item[:-1]
...   return item
... 
>>> items = [sorted(subitems)[-1] 
...          for _, subitems 
...          in itertools.groupby(keys, keyfunc)]
>>> items
['237870c/', '115460d/', '229898/', '212365a/', '109678/']

Upvotes: 2

Related Questions