Reputation: 7907
I have a list of elements, with associated sorting keys
data = [
{'data': 0},
{'key': 'foo', 'data': 1},
{'key': 'bar', 'data': 2},
{'data': 3},
{'key': 'foo', 'data': 4},
{'key': 'bar', 'data': 5},
{'key': 'bar', 'data': 6},
{'data': 7}
]
Some elements has sorting key
, some does not. I have an external table with sorting for some keys:
ORDER = ['bar', 'foo', 'baz']
Expected result:
[
{'data': 0},
{'key': 'bar', 'data': 2},
{'key': 'bar', 'data': 5},
{'data': 3},
{'key': 'bar', 'data': 6},
{'key': 'foo', 'data': 1},
{'key': 'foo', 'data': 4},
{'data': 7}
]
I need to sort the array in the way that only elements with key are moved, and elements with the same key are not moved. (Basically, I have a partial order for elements with requirements to not to swap elements if key comparison does not require so). As far as I can see, existing list.sort
/sorted
(in Python) does not support this type of ordering.
I strongly suspect this problem was solved many times, but I can't find a proper name to search.
How this type of sorting is called?
Upvotes: 3
Views: 714
Reputation: 61910
You can do this in O(n) time, if you have few keys, using something like a bucket sort for example:
import pprint
from collections import defaultdict
from itertools import chain
# create buckets
buckets, no_key = defaultdict(list), {}
for i, d in enumerate(data):
if 'key' in d:
buckets[d['key']].append(d)
else:
no_key[i] = d
# create one sorted list for elements with keys, basically iterate over the buckets in ORDER
chained = list(chain.from_iterable([buckets[key] for key in ORDER]))[::-1]
# combine the two iterables, chained and no_key into the final result
final = [no_key.pop(i) if i in no_key else chained.pop() for i in range(len(data))]
pprint.pprint(final)
Output
[{'data': 0},
{'data': 2, 'key': 'bar'},
{'data': 5, 'key': 'bar'},
{'data': 3},
{'data': 6, 'key': 'bar'},
{'data': 1, 'key': 'foo'},
{'data': 4, 'key': 'foo'},
{'data': 7}]
Upvotes: 2
Reputation: 402713
Create an iterator of sorted records:
it1 = iter(sorted((d for d in data if 'key' in d),
key=lambda d_: ORDER.index(d_['key'])))
Next, write a list comprehension to selectively overwrite records with "key", leaving others unchanged.
res = [d if 'key' not in d else next(it1) for d in data]
print (res)
# [{'data': 0},
# {'key': 'bar', 'data': 2},
# {'key': 'bar', 'data': 5},
# {'data': 3},
# {'key': 'bar', 'data': 6},
# {'key': 'foo', 'data': 1},
# {'key': 'foo', 'data': 4},
# {'data': 7}]
Some points:
it might be worth converting ORDER
into a dictionary mapping keys to positions, so the lookup is constant time. That should bring down the complexity of the sorting step.
sorted
returns a list, which is then converted to an iterator, so the intermediate list cannot be avoided.
Upvotes: 4