Reputation: 305
I am trying to use OrderedDict() to keep track of instances of a word. I have data that is organized by day, and I want to count the number of instances of 'foo' in that day. Each line is indexed by the day. Using defaultdict gives me what I want, but, of course, without the ordering:
from collections import defaultdict
counter = defaultdict(int)
w = open('file.txt', 'r')
y = w.readlines()
for line in y:
day,words = line[:6], line[14:]
if re.search(r"foo", words):
counter[day] += 1
If I use OrderedDict how can I do the same thing so I can have the data ordered the way it is read? If I use
for key, value in sorted(counter.items()):
print(key, value)
Then I get the list in alphabetical order. I know I could read the days into an array and then iterate the keys based on this, however, this seems very inefficient.
Suppose my text file looks like this:
Sep 1, 2014, 22:23 - ######: Here is a foo
Sep 1, 2014, 22:23 - ######: Not here
Sep 2, 2014, 19:09 - ######: foo sure
Sep 2, 2014, 19:57 - ######: footastic
Sep 2, 2014, 19:57 - ######: foo-king awesome
Sep 2, 2014, 19:57 - ######: No esta aqui
I want my dictionary to print:
('Sep 1,', 1)
('Sep 2,', 3)
Upvotes: 1
Views: 1185
Reputation: 42647
You can check that day
is in the OrderedDict. If so, add to it, if not set it to 1
.
counter = OrderedDict()
w = open('file.txt', 'r')
y = w.readlines()
for line in y:
day,words = line[:6], line[14:]
if re.search(r"foo", words):
if day in counter:
counter[day] += 1
else:
counter[day] = 1
Of course, the OrderedDict will then be ordered by the first occurrence of each day in your source text file.
Instead, you might consider parsing the date as a datetime.date object and using that as the key on your defaultdict. Then you can sort on the keys and get all items in order by date/time--regardless of what order they appear in your source text file.
As @user2357112 pointed out in a comment, you could make the logic simpler when incrementing the counter. Like this:
counter = OrderedDict()
w = open('file.txt', 'r')
y = w.readlines()
for line in y:
day,words = line[:6], line[14:]
if re.search(r"foo", words):
counter[day] = counter.get(day, 0) + 1
Upvotes: 1
Reputation: 12110
You can define your own class that inherits from both defaultdict
and OrderedDict
.
class OrderedDefaultDict(defaultdict, OrderedDict):
def __init__(self, default, *args, **kwargs):
defaultdict.__init__(self, default)
OrderedDict.__init__(self, *args, **kwargs)
counter = OrderedDefaultDict(int)
Upvotes: 0