Caitlin
Caitlin

Reputation:

Python List Frequency

I have to figure out how to print a frequency set. So far this is my code, but it keeps skipping the first number in the list. I assume that's because I have previous starting at data[0] but I don't know how else to fix that

def frequencies(data):

    data.sort()

    count = 0
    previous = data[0]

    print("data\tfrequency") # '\t' is the TAB character

    for d in data:
        if d == previous:
            # same as the previous, so just increment the count
            count += 1
        else:
            # we've found a new item so print out the old and reset the count
            print(str(previous) + "\t" + str(count))
            count = 1

        previous = d

Upvotes: 1

Views: 367

Answers (4)

ofer.sheffer
ofer.sheffer

Reputation: 5677

Source for skipping first item

from itertools import islice
for car in islice(cars, 1, None):
    # do something

For counting consecutive values, itertools.groupby() suggested by 200_success doesn't do the trick (Count() doesn't either) since these don't count adjacency but overall count. However, the presented question says 'frequency', and that CAN be counted with Count() or groupby().

A third alternative would be using a dict (better value-get time with keys as input):

from collections import defaultdict

appearances = defaultdict(int)
for curr in a:
    appearances[curr] += 1

Upvotes: 0

200_success
200_success

Reputation: 7582

Your diagnosis is correct. The first time through the loop, if d == previous will always be True, so the first group never gets printed. (Or, even worse, if the list is empty, then previous = data[0] crashes.)


The simple way to get the job done is to use itertools.groupby(). Look at the linked documentation to see how it could be implemented.

for datum, group in itertools.groupby(sorted(data)):
    print('{0}\t{1}'.format(datum, len(list(group))))

In addition, I am suggesting:

  • changing data.sort() to sorted(data), so as to avoid having the caller see the side-effect of altering the list order.
  • Using str.format() instead of concatenation with two explicit str() type conversions.

If you wanted to salvage your existing implementation, the quick fix would be to add an exception for the first pass:

for i, d in enumerate(data):
    if i > 0 and d == previous:
        …

You wouldn't even have to initialize count and previous.

Upvotes: 3

Josh Smeaton
Josh Smeaton

Reputation: 48720

Python comes with a built in Counter type for counting frequencies for you. This doesn't solve the original problem with the code, but it does what you want it to do.

>>> data = [1,2,3,4,2,2,3,5]
>>> c = Counter(data)
>>> c
Counter({2: 3, 3: 2, 1: 1, 4: 1, 5: 1})
>>> for key in sorted(c.keys()):
...     print('{}\t{}'.format(key, c[key]))
...
1   1
2   3
3   2
4   1
5   1

Upvotes: 4

Andrew Shirley
Andrew Shirley

Reputation: 417

Are you sure it's skipping the first one and not the last one? Right now it looks like it's ONLY printing information when you cross over from one data value to another. So if the entire file is one data value (e.g. a bunch of 1s), you'll never hit the "else" statement and never print.

You can get around this simply by printing the previous value and count one final time after the loop has completed.

Your first value should still be counted because you're initializing "previous" to the first value in data, so when you enter the loop, d == previous and you increment the count. That part looks like it'll do what you expect it to do.

If this isn't right, could you provide a simple input/output?

Upvotes: 0

Related Questions