alvas
alvas

Reputation: 122012

How to iterate through a defaultdict(list) in Python?

How do i iterate through a defaultdict(list) in Python? Is there a better way of having a dictionary of lists in Python? I've tried the normal iter(dict) but I've got the error:

>>> import para
>>> para.print_doc('./sentseg_en/essentials.txt')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "para.py", line 31, in print_doc
    for para in iter(doc):
TypeError: iteration over non-sequence

The main class:

import para
para.print_doc('./foo/bar/para-lines.txt')

The para.pyc:

# -*- coding: utf-8 -*-
## Modified paragraph into a defaultdict(list) structure
## Original code from http://code.activestate.com/recipes/66063/
from collections import defaultdict
class Paragraphs:
    import sys
    reload(sys)
    sys.setdefaultencoding('utf-8')
    # Separator here refers to the paragraph seperator,
    #  the default separator is '\n'.
    def __init__(self, filename, separator=None):
        # Set separator if passed into object's parameter,
        #  else set default separator as '\n'
        if separator is None:
            def separator(line): return line == '\n'
        elif not callable(separator):
            raise TypeError, "separator argument must be callable"
        self.separator = separator
        # Reading lines from files into a dictionary of lists
        self.doc = defaultdict(list)
        paraIndex = 0
        with open(filename) as readFile:
            for line in readFile:
                if line == separator:
                    paraIndex+=1
                else:
                    self.doc[paraIndex].append(line)

# Prints out populated doc from txtfile
def print_doc(filename):
    text = Paragraphs(filename)
    for para in iter(text.doc):
        for sent in text.doc[para]:
            print "Para#%d, Sent#%d: %s" % (
                para, text.doc[para].index(sent), sent)

An e.g. of ./foo/bar/para-lines.txt looks like this:

This is a start of a paragraph.
foo barr
bar foo
foo foo
This is the end.

This is the start of next para.
foo boo bar bar
this is the end.

The output of the main class should look like this:

Para#1,Sent#1: This is a start of a paragraph.
Para#1,Sent#2: foo barr
Para#1,Sent#3: bar foo
Para#1,Sent#4: foo foo
Para#1,Sent#5: This is the end.

Para#2,Sent#1: This is the start of next para.
Para#2,Sent#2: foo boo bar bar
Para#2,Sent#3: this is the end.

Upvotes: 2

Views: 5256

Answers (5)

unutbu
unutbu

Reputation: 879143

The recipe you linked to is rather old. It was written in 2001 before Python had more modern tools like itertools.groupby (introduced in Python2.4, released in late 2003). Here is what your code could look like using groupby:

import itertools
import sys

with open('para-lines.txt', 'r') as f:
    paranum = 0
    for is_separator, paragraph in itertools.groupby(f, lambda line: line == '\n'):
        if is_separator:
            # we've reached paragraph separator
            print
        else:
            paranum += 1
            for n, sentence in enumerate(paragraph, start = 1):
                sys.stdout.write(
                    'Para#{i:d},Sent#{n:d}: {s}'.format(
                        i = paranum, n = n, s = sentence))

Upvotes: 2

Daniel Roseman
Daniel Roseman

Reputation: 599490

I can't think of any reason why you're using a dict here, let alone a defaultdict. A list of list would be much simpler.

doc = []
with open(filename) as readFile:
    para = []
    for line in readFile:
        if line == separator:
            doc.append(para)
            para = []
        else:
            para.append(line)
    doc.append(para)

Upvotes: 0

Kathy Van Stone
Kathy Van Stone

Reputation: 26271

The problem you have with line

for para in iter(doc):

is that doc is an instance of Paragraph, not a defaultdict. The default dict you use in the __init__ method goes out of scope and is lost. So you need to do two things:

  1. Save the doc created in the __init__ method as an instance variable (self.doc, for example).

  2. Either make Paragraphs itself iterable (by adding an __iter__ method), or allow it to access the created doc object.

Upvotes: 4

soulcheck
soulcheck

Reputation: 36767

It's failing because you don't have __iter__() defined in your Paragraphs class and then try to call iter(doc) (where doc is a Paragraphs instance).

To be iterable a class has to have __iter__() which returns iterator. Docs here.

Upvotes: 0

Nicolas78
Nicolas78

Reputation: 5144

The problem seems to be that you're iterating over your Paragraphs class, not the dictionary. Also, instead of iterating over keys and then accessing the dictionary entry, consider using

for (key, value) in d.items():

Upvotes: 0

Related Questions