Adarsh
Adarsh

Reputation: 3573

Swapping the inner and outer keys of a nested dictionary

I have the following dictionary

{'a':{'se':3, 'op':2}, 'b':{'se':4, 'op':3}}

I need to convert it as follows:

{'se':{'a':3, 'b': 4}, 'op':{'a':2,'b':3}}

This is the following code I could come up with:

from collections import defaultdict

a = {'a':{'se':3, 'op':2}, 'b':{'se':4, 'op':3}}
b = defaultdict(dict)
for key1, value1 in a.items():
    for key2, value2 in value1.items():
        b[key2].update({key1: value2})

The following gets the job done but I am fond of one-liners. Is there a one-liner to the above or even a better way (better performance such as to eliminate two loops)?

Upvotes: 2

Views: 1903

Answers (2)

Konchog
Konchog

Reputation: 2188

So this improves @cs95, and provides a more readable 1 line. There are 2 lines here - but one may already have the inner keys ('k'). The point is that you can use the dict 'a' to transfer the values.

a = {'a':{'se':3, 'op':2}, 'b':{'se':4, 'op':3}}
k = list(a.values())[0].keys()
b = {i: {o: a[o][i] for o in a} for i in k}  # one line dict inversion
print(f'{a}\n{b}')

However, if you are doing this, you may not be using the best data structure; instead you could use a dictionary keyed by tuples, such as

a = {('a', 'se'):3, ('a', 'op'):2, ('b', 'se'):4, ('b', 'op'):3}

You can then sort by tuple position, and filter by tuple key

c = sorted(a, key=lambda x:x[1])
d = sorted(a, key=lambda x:x[0])
e = list(filter(lambda x:x[0] == 'a', a))  # list 
print(f'a: {a}\nc: {c}\nd: {d}\ne: {e}')

yields

a: {('a', 'se'): 3, ('a', 'op'): 2, ('b', 'se'): 4, ('b', 'op'): 3}
c: [('a', 'op'), ('b', 'op'), ('a', 'se'), ('b', 'se')]
d: [('a', 'se'), ('a', 'op'), ('b', 'se'), ('b', 'op')]
e: [('a', 'se'), ('a', 'op')]

Of course you can still access objects using keys:

x = a['a', 'op']  # returns 2

Even better maybe, if you are using a fixed set of keys, is to use a tuple of enums rather than str.

Upvotes: 3

abarnert
abarnert

Reputation: 365875

Turning this into a one-liner is doable in multiple ways, but all of them are going to be ugly. For example:

# Gather k2-k1-v2
g = ((k2,k1,v2) for k1,v1 in a.items() for k2,v2 in v1.items())
# Sort by k2
sg = sorted(g)
# Group by k2
gsg = itertools.groupby(sg, key=operator.itemgetter(0))
# Turn it into a dict of dicts
b = {k: {ksk[1]: ksk[2] for ksk in group} for (k, group) in gsg}

Putting it all together:

b = {k: {ksk[1]: ksk[2] for ksk in group} 
     for (k, group) in itertools.groupby(
         sorted(((k2,k1,v2) for k1,v1 in a.items() for k2,v2 in v1.items())),
         key=operator.itemgetter(0))}

Well, it's one expression, and you can put it all on one line if you don't know how many columns it is. But it's certainly not as readable as your original version.

And as for performance? It takes about twice as long from a quick test. Coldspeed's version is somewhere in between. Changing one list to an iterator makes it slightly slower on small dicts like your original example, but much faster on big ones, but either way, it didn't beat the original in any tests, and got slower than the itertools version with very big values. But of course if performance actually matters, you should measure them on your real data.

And if you think about it, there can't be any way to eliminate the nested loop (except by replacing one of the loops with something equivalent, like recursion, or unrolling it based on the fact that your example happens to have exactly two items in each inner dict—which presumably isn't true for your real problem). After all, you have to visit each key of each inner dictionary, which you can't do without a loop over the outer dictionary. It is possible to turn those loops into comprehension loops instead of statements, or C loops inside map or list (or maybe inside some Pandas function?). Both my version and Coldspeed's put the nested loop in a comprehension, and at least one of the additional linear loops (which don't add to the algorithmic complexity, but probably do add significantly to the real-life time for small collections like your example) are buried inside a builtin function. But speeding up the looping while doing more overall work isn't always a worthwhile tradeoff.

Upvotes: 0

Related Questions