Reputation:
Suppose we were interested in the most often-occurring time zones in the data set (the tz field). There are many ways we could do this. First, let’s extract a list of time zones again using a list comprehension:
In [26]: time_zones = [rec['tz'] for rec in records if 'tz' in rec]
In [27]: time_zones[:10]
Out[27]: [u'America/New_York', u'America/Denver', u'America/New_York', u'America/Sao_Paulo', u'America/New_York', u'America/New_York', u'Europe/Warsaw', u'', u'', u'']
Now, to produce counts by time zone:
def get_counts(sequence):
counts = {}
for x in sequence:
if x in counts:
counts[x] += 1
else:
counts[x] = 1
return counts
This is an excerpt from a textbook, I do not quite understand the loop used to find the number of occurences of a certain timezone. Can someone please break it down intuitively for me, I'm a beginner.
Follow up question:
If we wanted the top 10 time zones and their counts, we have to do a little bit of dic- tionary acrobatics:
def top_counts(count_dict, n=10):
value_key_pairs = [(count, tz) for tz, count in count_dict.items()]
value_key_pairs.sort()
return value_key_pairs[-n:]
The quotations mark the excerpt. Could someone please explain what goes on in the function top_counts?
Upvotes: 1
Views: 501
Reputation: 39
Re: followup question.
def top_counts(count_dict, n=10):
value_key_pairs = [(count, tz) for tz, count in count_dict.items()] # Converts dictionary into a list of tuples, i.e. {'aaa': 1, 'bbb': 12, 'ccc': 4} into [(1, 'aaa'), (12, 'bbb'), (4, 'ccc')]
value_key_pairs.sort() # Sorts the list. Default comparison function applied to tuples compares first elements first, and only if they are equal looks at second elements.
return value_key_pairs[-n:] # Returns the slice of the sorted array that has last n elements.
Upvotes: 0
Reputation: 222
The function get_counts
does the following:
For each timezone in the list:
Check if the timezone is already in the dictionary (if x in counts
).
If so, increment the number of occurrences by 1 (counts[x] += 1
).
If not, initialize the count to 1 (counts[x] = 1
).
In case you are curious, you could also do it like this:
from collections import Counter
ctr = Counter()
for x in sequence:
ctr[x] += 1
The Counter automatically returns 0 for missing items, so you don't need to initialize it.
Upvotes: 0
Reputation: 180787
def get_counts(sequence): # Defines the function.
counts = {} # Creates an empty dictionary.
for x in sequence: # Loops through each item in sequence
if x in counts: # If item already exists in dictionary
counts[x] += 1 # Add one to the current item in dictionary
else: # Otherwise...
counts[x] = 1 # Add item to dictionary, give it a count of 1
return counts # Returns the resulting dictionary.
Upvotes: 8
Reputation: 5811
Given the sequence is u'America/New_York',
u'America/Denver', u'America/New_York', u'America/Sao_Paulo', u'America/New_York', u'America/New_York', u'Europe/Warsaw', u'', u'', u'']
It will go like this:
for x in sequence: # traverse sequence, "u'America/New_York'" is the first item:
if x in counts: # if "u'America/New_York'" in counts:
counts[x] += 1 # counts["u'America/New_York'"] += 1
else: # else:
counts[x] = 1 # counts["u'America/New_York'"] = 1
# and so on...
return counts
Upvotes: 0
Reputation: 91590
This is basically using a dictionary (or a hash table) to store how many times each time zone has occurred. Each total is stored in counts
, keyed by the time zone string. This allows us to quickly look up an existing count so we can increment it by one.
First, we loop through each value in sequence
:
for x in sequence:
For each iteration, x
will be equal to the current value. For example, in the first iteration, x
will be equal to America/New_York.
Next, we have this confusing part:
if x in counts:
counts[x] += 1
else:
counts[x] = 1
Since you can't increment something that doesn't exist, we need to first check if that key already exists in the map. If we've never ran into that time zone before, it wouldn't exist yet. Thus, we need to set its initial value to 1
, since we know it has occurred at least one time so far.
If it already does exist (x
is in counts
), we just need to increment that key by one:
counts[x] += 1
Hope this makes more sense now!
Upvotes: 0
Reputation: 13974
The main operation here is the dictionary lookup.
if x in counts:
Checks to see if the timezone has been counted. If it exists in the counts dictionary, it'll increment. If it doesn't exist yet, create a new entry and set it to 1.
Upvotes: 1