Reputation: 1239
I read data from a bunch or emails and count frequency of each word. first construct two counters:
counters.stats = collections.defaultdict(dict)
counters.chi = collections.counter()
The key of stats is word. For each word, I construct a dict, whose key is the name of the email and value is the frequency of that word in this email.
The key of chi is the same words as those in stats. I want to sort the key in 'stats' by the keys in 'chi.' The problem is fixed by:
def print_stats(counters):
sorted_key = sorted(counters.stats, key = counters.chi.get)
result = collections.OrderedDict(k, counters.stats[k] for key in sorted_key)
for form, cat_to_stats in result.items():
Upvotes: 1
Views: 354
Reputation: 151087
If I understand you correctly, this should do what you want:
>>> stats = {'a': {'email1':4, 'email2':3},
... 'the': {'email1':2, 'email3':4},
... 'or': {'email1':2, 'email3':1}}
>>> chi = {'a': 7, 'the':6, 'or':3}
>>> sorted(stats, key=chi.get)
['or', 'the', 'a']
Let me know if this works for you. Also, as Boud mentioned above, you should consider numpy
/scipy
, which would probably provide better performance -- and would definitely provide lots of built-in functionality.
Since you say this doesn't work -- for reasons you haven't yet explained -- here's a more general example of how to use the key
argument. This shows that get
works with Counter
objects as well as standard dicts, but also how to create a function that does something :
>>> stats = {'a': {'email1':4, 'email2':3},
... 'the': {'email1':2, 'email3':4},
... 'or': {'email1':2, 'email3':1}}
>>> wordlists = ([k] * sum(d.itervalues()) for k, d in stats.iteritems())
>>> chi = collections.Counter(word for seq in wordlists for word in seq)
>>> sorted(stats, key=chi.get)
['or', 'the', 'a']
>>> sorted(stats, key=lambda x: chi[x] + 3)
['or', 'the', 'a']
>>> sorted(stats, key=chi.get, reverse=True)
['a', 'the', 'or']
I still don't completely understand what you're looking for, but perhaps you mean to get a sorted list of key, value tuples?
>>> sorted(stats.iteritems(), key=lambda x: chi[x[0]])
[('or', {'email3': 1, 'email1': 2}),
('the', {'email3': 4, 'email1': 2}),
('a', {'email2': 3, 'email1': 4})]
I would actually recommend splitting this up though:
>>>> sorted_keys = sorted(stats, key=chi.get)
>>>> [(k, stats[k]) for k in sorted_keys]
[('or', {'email3': 1, 'email1': 2}), ('the', {'email3': 4, 'email1': 2}), ('a', {'email2': 3, 'email1': 4})]
You said you want something sorted by the values in chi
, but "with the same structure as stats." That's not possible because dictionaries don't have an order; the closest you can come is a sorted list of tuples, or an OrderedDict
(in 2.7+).
>>>> collections.OrderedDict((k, stats[k]) for k in sorted_keys)
OrderedDict([('or', {'email3': 1, 'email1': 2}), ('the', {'email3': 4, 'email1': 2}), ('a', {'email2': 3, 'email1': 4})])
If you have to frequently reorder the dictionary, this method is kind of pointless.
Upvotes: 3