Azubike
Azubike

Reputation: 95

iterating over list containing duplicate values

I am looking to iterate over a list with duplicate values. The 101 has 101.A and 101.B which is right but the 102 starts from 102.C instead of 102.A

import string
room_numbers = ['101','103','101','102','104','105','106','107','102','108']
door_numbers = []
num_count = 0
for el in room_numbers:
    if room_numbers.count(el) == 1:
        door_numbers.append("%s.%s" % (el, string.ascii_uppercase[0]))
    elif room_numbers.count(el) > 1:
        door_numbers.append("%s.%s" % (el, string.ascii_uppercase[num_count]))
        num_count += 1

door_numbers = ['101.A','103.A','101.B','102.C','104.A',
                '105.A','106.A','107.A','102.D','108.A']   

Upvotes: 4

Views: 143

Answers (4)

pylang
pylang

Reputation: 44455

Given

import string
import itertools as it
import collections as ct


room_numbers = ['101','103','101','102','104','105','106','107','102','108']
letters = string.ascii_uppercase

Code

Simple, Two-Line Solution

dd = ct.defaultdict(it.count)    
print([".".join([room, letters[next(dd[room])]]) for room in room_numbers])

or

dd = ct.defaultdict(lambda: iter(letters))
print([".".join([room, next(dd[room])]) for room in room_numbers])

Output

['101.A', '103.A', '101.B', '102.A', '104.A', '105.A', '106.A', '107.A', '102.B', '108.A']

Details

In the first example we are using itertools.count as a default factory. This means that a new count() iterator is made whenever a new room number is added to the defaultdict dd. Iterators are useful because they are lazily evaluated and memory efficient.

In the list comprehension, these iterators get initialized per room number. The next number of the counter is yielded, the number is used as an index to get a letter, and the result is simply joined as a suffix to each room number.

In the second example (recommended), we use an iterator of strings as the default factory. The callable requirement is satisfied by returning the iterator in a lambda function. An iterator of strings enables us to simply call next() and directly get the next letter. Consequently, the comprehension is simplified since slicing letters is no longer required.

Upvotes: 1

Harvey
Harvey

Reputation: 5821

Using iterators and comprehensions:

  1. Enumerate the rooms to preserve the original order
  2. Group rooms by room number, sorting first as required by groupby()
  3. For each room in a group, append .A, .B, etc.
  4. Sort by the enumeration values from step 1 to restore the original order
  5. Extract the door numbers, e.g. '101.A'

.

#!/usr/bin/env python3

import operator
from itertools import groupby
import string

room_numbers = ['101', '103', '101', '102', '104',
                '105', '106', '107', '102', '108']
get_room_number = operator.itemgetter(1)
enumerated_and_sorted = sorted(list(enumerate(room_numbers)),
                               key=get_room_number)
# [(0, '101'), (2, '101'), (3, '102'), (8, '102'), (1, '103'),
#  (4, '104'), (5, '105'), (6, '106'), (7, '107'), (9, '108')]  
grouped_by_room = groupby(enumerated_and_sorted, key=get_room_number)
# [('101', [(0, '101'), (2, '101')]),
#  ('102', [(3, '102'), (8, '102')]),
#  ('103', [(1, '103')]),
#  ('104', [(4, '104')]),
#  ('105', [(5, '105')]),
#  ('106', [(6, '106')]),
#  ('107', [(7, '107')]),
#  ('108', [(9, '108')])] 
door_numbers = ((order, '{}.{}'.format(room, char))
                for _, room_list in grouped_by_room
                for (order, room), char in zip(room_list,
                                               string.ascii_uppercase))
# [(0, '101.A'), (2, '101.B'), (3, '102.A'), (8, '102.B'),
#  (1, '103.A'), (4, '104.A'), (5, '105.A'), (6, '106.A'),
#  (7, '107.A'), (9, '108.A')] 
door_numbers = [room for _, room in sorted(door_numbers)]
# ['101.A', '103.A', '101.B', '102.A', '104.A',
#  '105.A', '106.A', '107.A', '102.B', '108.A']                                         

Upvotes: 0

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 95873

The naive way, simply count the number of times the element is contained in the list up until that index:

>>> door_numbers = []
>>> for i in xrange(len(room_numbers)):
...     el = room_numbers[i]
...     n = 0
...     for j in xrange(0, i):
...         n += el == room_numbers[j]
...     c = string.ascii_uppercase[n]
...     door_numbers.append("{}.{}".format(el, c))
...
>>> door_numbers
['101.A', '103.A', '101.B', '102.A', '104.A', '105.A', '106.A', '107.A', '102.B', '108.A']

This two explicit for-loops make the quadratic complexity pop out. Indeed, (1/2) * (N * (N-1)) iterations are made. I would say that in most cases you would be better off keeping a dict of counts instead of counting each time.

>>> door_numbers = []
>>> counts = {}
>>> for el in room_numbers:
...     count = counts.get(el, 0)
...     c = string.ascii_uppercase[count]
...     counts[el] = count + 1
...     door_numbers.append("{}.{}".format(el, c))
...
>>> door_numbers
['101.A', '103.A', '101.B', '102.A', '104.A', '105.A', '106.A', '107.A', '102.B', '108.A']

That way, there's no messing around with indices, and it's more time efficient (at the expense of auxiliary space).

Upvotes: 0

Sudheesh Singanamalla
Sudheesh Singanamalla

Reputation: 2297

The problem in your implementation is that you have a value num_count which is continuously incremented for each item in the list than just the specific items' count. What you'd have to do instead is to count the number of times each of the item has occurred in the list.

Pseudocode would be 1. For each room in room numbers 2. Add the room to a list of visited rooms 3. Count the number of times the room number is available in visited room 4. Add the count to 64 and convert it to an ascii uppercase character where 65=A 5. Join the required strings in the way you want to and then append it to the door_numbers list.

Here's an implementation

import string
room_numbers = ['101','103','101','102','104','105','106','107','102','108']
door_numbers = []

visited_rooms = []
for room in room_numbers:
    visited_rooms.append(room)
    room_count = visited_rooms.count(room)
    door_value = chr(64+room_count) # Since 65 = A when 1st item is present
    door_numbers.append("%s.%s"%(room, door_value))

door_numbers now contains the final list you're expecting which is

['101.A', '103.A', '101.B', '102.A', '104.A', '105.A', '106.A', '107.A', '102.B', '108.A']

for the given input room_numbers

Upvotes: 0

Related Questions