Reputation:
I have to figure out how to print a frequency set. So far this is my code, but it keeps skipping the first number in the list. I assume that's because I have previous starting at data[0]
but I don't know how else to fix that
def frequencies(data):
data.sort()
count = 0
previous = data[0]
print("data\tfrequency") # '\t' is the TAB character
for d in data:
if d == previous:
# same as the previous, so just increment the count
count += 1
else:
# we've found a new item so print out the old and reset the count
print(str(previous) + "\t" + str(count))
count = 1
previous = d
Upvotes: 1
Views: 367
Reputation: 5677
Source for skipping first item
from itertools import islice
for car in islice(cars, 1, None):
# do something
For counting consecutive values, itertools.groupby() suggested by 200_success doesn't do the trick (Count() doesn't either) since these don't count adjacency but overall count. However, the presented question says 'frequency', and that CAN be counted with Count() or groupby().
A third alternative would be using a dict (better value-get time with keys as input):
from collections import defaultdict
appearances = defaultdict(int)
for curr in a:
appearances[curr] += 1
Upvotes: 0
Reputation: 7582
Your diagnosis is correct. The first time through the loop, if d == previous
will always be True
, so the first group never gets printed. (Or, even worse, if the list is empty, then previous = data[0]
crashes.)
The simple way to get the job done is to use itertools.groupby()
. Look at the linked documentation to see how it could be implemented.
for datum, group in itertools.groupby(sorted(data)):
print('{0}\t{1}'.format(datum, len(list(group))))
In addition, I am suggesting:
data.sort()
to sorted(data)
, so as to avoid having the caller see the side-effect of altering the list order.str.format()
instead of concatenation with two explicit str()
type conversions.If you wanted to salvage your existing implementation, the quick fix would be to add an exception for the first pass:
for i, d in enumerate(data):
if i > 0 and d == previous:
…
You wouldn't even have to initialize count
and previous
.
Upvotes: 3
Reputation: 48720
Python comes with a built in Counter type for counting frequencies for you. This doesn't solve the original problem with the code, but it does what you want it to do.
>>> data = [1,2,3,4,2,2,3,5]
>>> c = Counter(data)
>>> c
Counter({2: 3, 3: 2, 1: 1, 4: 1, 5: 1})
>>> for key in sorted(c.keys()):
... print('{}\t{}'.format(key, c[key]))
...
1 1
2 3
3 2
4 1
5 1
Upvotes: 4
Reputation: 417
Are you sure it's skipping the first one and not the last one? Right now it looks like it's ONLY printing information when you cross over from one data value to another. So if the entire file is one data value (e.g. a bunch of 1s), you'll never hit the "else" statement and never print.
You can get around this simply by printing the previous value and count one final time after the loop has completed.
Your first value should still be counted because you're initializing "previous" to the first value in data, so when you enter the loop, d == previous and you increment the count. That part looks like it'll do what you expect it to do.
If this isn't right, could you provide a simple input/output?
Upvotes: 0