Reputation: 15
I've got the following code:
for paragraph in document.paragraphs:
while paragraph.style.name == 'Heading 2':
print(paragraph.style.name)
print(paragraph.text)
This basically doesn't work because I don't know how to accommodate the right logic. I'm using python docx library https://python-docx.readthedocs.io/en/latest/user/styles-using.html to iterate through the document's paragraphs.
Now, I want to split the list of paragraphs into sublists starting from every Heading 2
, then adding all the next paragraphs with different paragraph.style.name
until the next Heading 2
element, so that each chunk will contain one Heading 2
paragraph with its corresponding text.
In other words, I'm looking for a way to split the list into chunks from one element to another. Please help :)
Upvotes: 1
Views: 94
Reputation: 13106
You could use an itertools.groupby
to accomplish this:
from itertools import groupby
groups, next_group = [], []
for k, group in groupby(document.paragraphs, lambda x: x.style.name == 'Heading 2'):
# If the predicate is True and next_group is populated,
# we create a new chunk
if k and next_group:
groups.append(next_group)
next_group = []
# Fill up the current chunk
for paragraph in group:
# feel free to swap this out with a print statement
# or whatever data structure suits you
next_group.append({'style_name': paragraph.style.name, 'text': paragraph.text})
I'm using a list of dictionaries here for clarity, but you can substitute any data structure
Upvotes: 1