superhero
superhero

Reputation: 305

Using OrderedDict to count instances

I am trying to use OrderedDict() to keep track of instances of a word. I have data that is organized by day, and I want to count the number of instances of 'foo' in that day. Each line is indexed by the day. Using defaultdict gives me what I want, but, of course, without the ordering:

from collections import defaultdict
counter = defaultdict(int)

w = open('file.txt', 'r')
y = w.readlines()
for line in y:
    day,words = line[:6], line[14:]
    if re.search(r"foo", words):
        counter[day] += 1

If I use OrderedDict how can I do the same thing so I can have the data ordered the way it is read? If I use

for key, value in sorted(counter.items()):
    print(key, value)

Then I get the list in alphabetical order. I know I could read the days into an array and then iterate the keys based on this, however, this seems very inefficient.

Suppose my text file looks like this:

Sep 1, 2014, 22:23 - ######: Here is a foo
Sep 1, 2014, 22:23 - ######: Not here
Sep 2, 2014, 19:09 - ######: foo sure
Sep 2, 2014, 19:57 - ######: footastic
Sep 2, 2014, 19:57 - ######: foo-king awesome
Sep 2, 2014, 19:57 - ######: No esta aqui

I want my dictionary to print:

('Sep 1,', 1)
('Sep 2,', 3)

Upvotes: 1

Views: 1185

Answers (2)

Waylan
Waylan

Reputation: 42647

You can check that day is in the OrderedDict. If so, add to it, if not set it to 1.

counter = OrderedDict()

w = open('file.txt', 'r')
y = w.readlines()
for line in y:
    day,words = line[:6], line[14:]
    if re.search(r"foo", words):
        if day in counter:
            counter[day] += 1
        else:
            counter[day] = 1

Of course, the OrderedDict will then be ordered by the first occurrence of each day in your source text file.

Instead, you might consider parsing the date as a datetime.date object and using that as the key on your defaultdict. Then you can sort on the keys and get all items in order by date/time--regardless of what order they appear in your source text file.


As @user2357112 pointed out in a comment, you could make the logic simpler when incrementing the counter. Like this:

counter = OrderedDict()

w = open('file.txt', 'r')
y = w.readlines()
for line in y:
    day,words = line[:6], line[14:]
    if re.search(r"foo", words):
        counter[day] = counter.get(day, 0) + 1

Upvotes: 1

Yossi
Yossi

Reputation: 12110

You can define your own class that inherits from both defaultdict and OrderedDict.

class OrderedDefaultDict(defaultdict, OrderedDict):
    def __init__(self, default, *args, **kwargs):
        defaultdict.__init__(self, default)
        OrderedDict.__init__(self, *args, **kwargs)

counter = OrderedDefaultDict(int)

Upvotes: 0

Related Questions