SQLAlchemy: Iterate over each subset of a query result set partitioned by ordering parameters

Question

Kind of a basic question, which does not apply to SQLAlchemy specifically (same happened when I played with MySQL-python), but that's the library I'm currently working with.

Say I execute a query which returns the content of a fairly large table, on which an ordering is applied with respect to a certain attribute. In my case I'm fetching benchmark measurements from a table which references the processor on which the data has been recorded.

So what I have is:

measurements = session.query(Measurement)\
    .join(Processor)\
    .order_by(Processor.name)\

Now what I would like to do is iterate over the result set, but in terms of the subsets defined by the different processor names. Is there any convenient way to do this partitioning without a lot of boilerplate code?

Naively I would write something like

for proc_name, sublist in gen_partitions(measurements.all()):
    set_up_some_stuff(proc_name)
    for meas in sublist:
        process(meas)

which means I have to implement a generator function gen_partitions:

def gen_partitions(measurements):
   i = 0
   while (i < len(measurements)):
      plist = []
      m = measurements[i]
      plist.append(m)
      i = i+1
      while i < len(measurements) and \
            measurements[i].processor.name == m.processor.name:

         plist.append(measurements[i])
         i = i+1

      yield m.processor.name, plist

Feels like a lot of boilerplate. Is there a better way to do it?

Alex Martelli · Accepted Answer

for proc_name, ms in itertools.groupby(measurements, lambda m: m.processor.name):
    set_up_some_stuff(proc_name)
    for meas in ms:
        process(meas)

would appear to meet your requirements -- any reasons you haven't considered standard library module itertools?

Note that I've renamed the sublist to ms because it's an iterator, not a list. If you do need to have those measurements in a list (in order to do something else than just looping or them &c), that's easily achieved too, just add in the outer for body a

    sublist = list(ms)

SQLAlchemy: Iterate over each subset of a query result set partitioned by ordering parameters

Answers (1)

Related Questions