Reputation: 122012
How do i iterate through a defaultdict(list) in Python?
Is there a better way of having a dictionary of lists in Python?
I've tried the normal iter(dict)
but I've got the error:
>>> import para
>>> para.print_doc('./sentseg_en/essentials.txt')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "para.py", line 31, in print_doc
for para in iter(doc):
TypeError: iteration over non-sequence
The main class:
import para
para.print_doc('./foo/bar/para-lines.txt')
The para.pyc:
# -*- coding: utf-8 -*-
## Modified paragraph into a defaultdict(list) structure
## Original code from http://code.activestate.com/recipes/66063/
from collections import defaultdict
class Paragraphs:
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
# Separator here refers to the paragraph seperator,
# the default separator is '\n'.
def __init__(self, filename, separator=None):
# Set separator if passed into object's parameter,
# else set default separator as '\n'
if separator is None:
def separator(line): return line == '\n'
elif not callable(separator):
raise TypeError, "separator argument must be callable"
self.separator = separator
# Reading lines from files into a dictionary of lists
self.doc = defaultdict(list)
paraIndex = 0
with open(filename) as readFile:
for line in readFile:
if line == separator:
paraIndex+=1
else:
self.doc[paraIndex].append(line)
# Prints out populated doc from txtfile
def print_doc(filename):
text = Paragraphs(filename)
for para in iter(text.doc):
for sent in text.doc[para]:
print "Para#%d, Sent#%d: %s" % (
para, text.doc[para].index(sent), sent)
An e.g. of ./foo/bar/para-lines.txt
looks like this:
This is a start of a paragraph.
foo barr
bar foo
foo foo
This is the end.
This is the start of next para.
foo boo bar bar
this is the end.
The output of the main class should look like this:
Para#1,Sent#1: This is a start of a paragraph.
Para#1,Sent#2: foo barr
Para#1,Sent#3: bar foo
Para#1,Sent#4: foo foo
Para#1,Sent#5: This is the end.
Para#2,Sent#1: This is the start of next para.
Para#2,Sent#2: foo boo bar bar
Para#2,Sent#3: this is the end.
Upvotes: 2
Views: 5256
Reputation: 879143
The recipe you linked to is rather old. It was written in 2001 before Python had more modern tools like itertools.groupby (introduced in Python2.4, released in late 2003). Here is what your code could look like using groupby
:
import itertools
import sys
with open('para-lines.txt', 'r') as f:
paranum = 0
for is_separator, paragraph in itertools.groupby(f, lambda line: line == '\n'):
if is_separator:
# we've reached paragraph separator
print
else:
paranum += 1
for n, sentence in enumerate(paragraph, start = 1):
sys.stdout.write(
'Para#{i:d},Sent#{n:d}: {s}'.format(
i = paranum, n = n, s = sentence))
Upvotes: 2
Reputation: 599490
I can't think of any reason why you're using a dict here, let alone a defaultdict. A list of list would be much simpler.
doc = []
with open(filename) as readFile:
para = []
for line in readFile:
if line == separator:
doc.append(para)
para = []
else:
para.append(line)
doc.append(para)
Upvotes: 0
Reputation: 26271
The problem you have with line
for para in iter(doc):
is that doc
is an instance of Paragraph, not a defaultdict
. The default dict you use in the __init__
method goes out of scope and is lost. So you need to do two things:
Save the doc
created in the __init__
method as an instance variable (self.doc
, for example).
Either make Paragraphs
itself iterable (by adding an __iter__
method), or allow it to access the created doc
object.
Upvotes: 4
Reputation: 36767
It's failing because you don't have __iter__()
defined in your Paragraphs class and then try to call iter(doc)
(where doc is a Paragraphs instance).
To be iterable a class has to have __iter__()
which returns iterator. Docs here.
Upvotes: 0
Reputation: 5144
The problem seems to be that you're iterating over your Paragraphs
class, not the dictionary. Also, instead of iterating over keys and then accessing the dictionary entry, consider using
for (key, value) in d.items():
Upvotes: 0