Mass17
Mass17

Reputation: 1605

Append multiple values in dictionary but avoid repeated elements in the value of the dictionary in python

I have list like this:

[('0023_RIGHT_CC', [(2574, 2798), (1324, 1545)]),
 ('0021_LEFT_CC', [(1180, 1420), (883, 1140)]),
 ('0106_LEFT_CC', [(911, 1135), (0, 86)]),
 ('0026_LEFT_CC', [(3738, 3968), (2144, 2352)]),
 ('0021_RIGHT_CC', [(2170, 2314), (1642, 1795)]),
 ('0106_LEFT_CC', [(1679, 1833), (964, 1102)]),
 ('0106_LEFT_CC', [(1091, 1198), (65, 160)]),
 ('0021_LEFT_CC', [(1180, 1420), (883, 1140)]),
 ('0021_RIGHT_CC', [(2170, 2314), (1642, 1795)]),
 ('0106_LEFT_CC', [(911, 1135), (0, 86)]),
 ('0106_LEFT_CC', [(1679, 1833), (964, 1102)]),
 ('0106_LEFT_CC', [(1091, 1198), (65, 160)]),
 ('0018_RIGHT_CC', [(1388, 1653), (1894, 2197)]),
 ('0023_RIGHT_CC', [(2574, 2798), (1324, 1545)]),
 ('0026_LEFT_CC', [(3738, 3968), (2144, 2352)])]

from this I want to create a dictionary like this.. for example;

0106_LEFT_CC: [[(911, 1135), (0, 86)], [(1091, 1198), (65, 160)], [(1679, 1833), (964, 1102)]]

Please note that the repeated elements in the value of the dictionary should be avoided. I did as follow;

from collections import defaultdict

d = defaultdict(list)
for key, value in data:
    d[key].append(value)

and I got the following result.

defaultdict(list,
            {'0023_RIGHT_CC': [[(2574, 2798), (1324, 1545)],
              [(2574, 2798), (1324, 1545)]],
             '0021_LEFT_CC': [[(1180, 1420), (883, 1140)],
              [(1180, 1420), (883, 1140)]],
             '0106_LEFT_CC': [[(911, 1135), (0, 86)],
              [(1679, 1833), (964, 1102)],
              [(1091, 1198), (65, 160)],
              [(911, 1135), (0, 86)],
              [(1679, 1833), (964, 1102)],
              [(1091, 1198), (65, 160)]],
             '0026_LEFT_CC': [[(3738, 3968), (2144, 2352)],
              [(3738, 3968), (2144, 2352)]],
             '0021_RIGHT_CC': [[(2170, 2314), (1642, 1795)],
              [(2170, 2314), (1642, 1795)]],
             '0018_RIGHT_CC': [[(1388, 1653), (1894, 2197)]]})

But the result shows for example, in the key 0106_LEFT_CC, some values are repeated. How to avoid it.

Upvotes: 3

Views: 1124

Answers (2)

Ekrem Dinçel
Ekrem Dinçel

Reputation: 1141

You can convert them into a set and then to a list again. This will get you rid of repeated elements.

from collections import defaultdict

data = [('0023_RIGHT_CC', [(2574, 2798), (1324, 1545)]),
 ('0021_LEFT_CC', [(1180, 1420), (883, 1140)]),
 ('0106_LEFT_CC', [(911, 1135), (0, 86)]),
 ('0026_LEFT_CC', [(3738, 3968), (2144, 2352)]),
 ('0021_RIGHT_CC', [(2170, 2314), (1642, 1795)]),
 ('0106_LEFT_CC', [(1679, 1833), (964, 1102)]),
 ('0106_LEFT_CC', [(1091, 1198), (65, 160)]),
 ('0021_LEFT_CC', [(1180, 1420), (883, 1140)]),
 ('0021_RIGHT_CC', [(2170, 2314), (1642, 1795)]),
 ('0106_LEFT_CC', [(911, 1135), (0, 86)]),
 ('0106_LEFT_CC', [(1679, 1833), (964, 1102)]),
 ('0106_LEFT_CC', [(1091, 1198), (65, 160)]),
 ('0018_RIGHT_CC', [(1388, 1653), (1894, 2197)]),
 ('0023_RIGHT_CC', [(2574, 2798), (1324, 1545)]),
 ('0026_LEFT_CC', [(3738, 3968), (2144, 2352)])]

d = defaultdict(list)
for key, value in data:
    d[key] = list(set(d[key] + value))


EDIT: I changed answer of @JST99 a bit and now it uses a set for in checks, then adds value to a list. This is faster on large data sets, but for your current data answer of @JST99 is faster.

from collections import defaultdict


data = [('0023_RIGHT_CC', [(2574, 2798), (1324, 1545)]),
 ('0021_LEFT_CC', [(1180, 1420), (883, 1140)]),
 ('0106_LEFT_CC', [(911, 1135), (0, 86)]),
 ('0026_LEFT_CC', [(3738, 3968), (2144, 2352)]),
 ('0021_RIGHT_CC', [(2170, 2314), (1642, 1795)]),
 ('0106_LEFT_CC', [(1679, 1833), (964, 1102)]),
 ('0106_LEFT_CC', [(1091, 1198), (65, 160)]),
 ('0021_LEFT_CC', [(1180, 1420), (883, 1140)]),
 ('0021_RIGHT_CC', [(2170, 2314), (1642, 1795)]),
 ('0106_LEFT_CC', [(911, 1135), (0, 86)]),
 ('0106_LEFT_CC', [(1679, 1833), (964, 1102)]),
 ('0106_LEFT_CC', [(1091, 1198), (65, 160)]),
 ('0018_RIGHT_CC', [(1388, 1653), (1894, 2197)]),
 ('0023_RIGHT_CC', [(2574, 2798), (1324, 1545)]),
 ('0026_LEFT_CC', [(3738, 3968), (2144, 2352)])]


d = defaultdict(list)
s = defaultdict(set)
for key, value in data:
    s_value = tuple(value) # lists are unhasable, so we will use tuple
    if s_value not in s[key]: 
        d[key].append(value)
        s[key].add(s_value)

del s # we dont need s anymore

Upvotes: 4

Jake Tae
Jake Tae

Reputation: 1741

@Ekrem DİNÇEL has already supplied a fully functional solution that deserves selection. Nonetheless, purely for OP's reference purposes, I post an alternative solution that avoids the repeated casting from list to set, and vice versa.

That is, if you are okay with the values of the resulting dictionary being a simple set instead of a list, you can simply initialize the defaultdict to work with set values and deal with duplicate elements.

d = defaultdict(set)

for key, values in data:
    for value in values:
        d[key].add(value)

The result:

{'0023_RIGHT_CC': {(1324, 1545), (2574, 2798)}, '0021_LEFT_CC': {(1180, 1420), (883, 1140)}, '0106_LEFT_CC': {(0, 86), (964, 1102), (1679, 1833), (65, 160), (911, 1135), (1091, 1198)}, '0026_LEFT_CC': {(3738, 3968), (2144, 2352)}, '0021_RIGHT_CC': {(1642, 1795), (2170, 2314)}, '0018_RIGHT_CC': {(1388, 1653), (1894, 2197)}}

EDIT

This implementation checks for whether or not a duplicate value exists in the nested list and appends the value only if it is a non-duplicate. This way, we can preserve the tuple data structure as specified. Note, however, that this implementation is slightly inefficient due to the fact that the in operation takes linear time on list (as opposed to constant time on set).

d = defaultdict(list)

for key, value in data:
    if value not in d[key]:
        d[key].append(value)

The result:

{'0023_RIGHT_CC': [[(2574, 2798), (1324, 1545)]], '0021_LEFT_CC': [[(1180, 1420), (883, 1140)]], '0106_LEFT_CC': [[(911, 1135), (0, 86)], [(1679, 1833), (964, 1102)], [(1091, 1198), (65, 160)]], '0026_LEFT_CC': [[(3738, 3968), (2144, 2352)]], '0021_RIGHT_CC': [[(2170, 2314), (1642, 1795)]], '0018_RIGHT_CC': [[(1388, 1653), (1894, 2197)]]}

Upvotes: 3

Related Questions