randombits
randombits

Reputation: 48490

Using the groupby method in Python, example included

Trying to work with groupby so that I can group together files that were created on the same day. When I say same day in this case, I mean the dd part in mm/dd/yyyy. So if a file was created on March 1 and April 1, they should be grouped together because the "1" matches. Here's the code I have so far:

#!/usr/bin/python
import os
import datetime
from itertools import groupby

def created_ymd(fn):
  ts = os.stat(fn).st_ctime
  dt = datetime.date.fromtimestamp(ts)
  return dt.year, dt.month, dt.day

def get_files():
  files = []
  for f in os.listdir(os.getcwd()):
    if not os.path.isfile(f): continue
    y,m,d = created_ymd(f)
    files.append((f, d))
  return files

files = get_files()
for key, group in groupby(files, lambda x: x[1]):
  for file in group:
    print "file: %s, date: %s" % (file[0], key)
  print " "

The problem is, I get lots of files that get grouped together based on the day. But then I'll see multiple groups with the same day. Meaning I might have 4 files grouped that were created on the 17th. Later on I'll see another unique set of 2 files that are also created on the 17th. Where am I going wrong?

Upvotes: 1

Views: 4257

Answers (4)

MRocklin
MRocklin

Reputation: 57281

It sounds like you don't need the streaming nature of the groupby found in standard itertools. A non-streaming groupby is implemented in the toolz library.

$ pip install toolz
$ python

>>> from toolz import groupby

>>> names = ['Alice', 'Bob', 'Charlie', 'Dan', 'Edith', 'Frank']
>>> groupby(len, names)
{3: ['Bob', 'Dan'], 5: ['Alice', 'Edith', 'Frank'], 7: ['Charlie']}

No sorting or fancy iterators involved.

Upvotes: 0

Mark Tolonen
Mark Tolonen

Reputation: 177971

To quote the docs: "Generally, the iterable needs to already be sorted on the same key function."

grouping = lambda x: x[1]
files.sort(key=grouping)
for key, group in groupby(files, grouping):
    ...

Upvotes: 0

Etaoin
Etaoin

Reputation: 8734

groupby() produces a new group every time the key changes, which means you have to sort your data first in order to group all similar elements together. Try this instead:

files = sorted(get_files(), key=(lambda x: x[1]))

and then run your for loop.

Upvotes: 2

T. Stone
T. Stone

Reputation: 19495

The list that you're feeding to groupby needs to be sorted whatever it is that you are gouping by, in this case by dd.

Upvotes: 1

Related Questions