Reputation: 48490
Trying to work with groupby so that I can group together files that were created on the same day. When I say same day in this case, I mean the dd part in mm/dd/yyyy. So if a file was created on March 1 and April 1, they should be grouped together because the "1" matches. Here's the code I have so far:
#!/usr/bin/python
import os
import datetime
from itertools import groupby
def created_ymd(fn):
ts = os.stat(fn).st_ctime
dt = datetime.date.fromtimestamp(ts)
return dt.year, dt.month, dt.day
def get_files():
files = []
for f in os.listdir(os.getcwd()):
if not os.path.isfile(f): continue
y,m,d = created_ymd(f)
files.append((f, d))
return files
files = get_files()
for key, group in groupby(files, lambda x: x[1]):
for file in group:
print "file: %s, date: %s" % (file[0], key)
print " "
The problem is, I get lots of files that get grouped together based on the day. But then I'll see multiple groups with the same day. Meaning I might have 4 files grouped that were created on the 17th. Later on I'll see another unique set of 2 files that are also created on the 17th. Where am I going wrong?
Upvotes: 1
Views: 4257
Reputation: 57281
It sounds like you don't need the streaming nature of the groupby
found in standard itertools. A non-streaming groupby
is implemented in the toolz
library.
$ pip install toolz
$ python
>>> from toolz import groupby
>>> names = ['Alice', 'Bob', 'Charlie', 'Dan', 'Edith', 'Frank']
>>> groupby(len, names)
{3: ['Bob', 'Dan'], 5: ['Alice', 'Edith', 'Frank'], 7: ['Charlie']}
No sorting or fancy iterators involved.
Upvotes: 0
Reputation: 177971
To quote the docs: "Generally, the iterable needs to already be sorted on the same key function."
grouping = lambda x: x[1]
files.sort(key=grouping)
for key, group in groupby(files, grouping):
...
Upvotes: 0
Reputation: 8734
groupby()
produces a new group every time the key changes, which means you have to sort your data first in order to group all similar elements together. Try this instead:
files = sorted(get_files(), key=(lambda x: x[1]))
and then run your for
loop.
Upvotes: 2
Reputation: 19495
The list that you're feeding to groupby
needs to be sorted whatever it is that you are gouping by, in this case by dd
.
Upvotes: 1