Alan Kashkash
Alan Kashkash

Reputation: 15

Python chunk list from one element to another

I've got the following code:

for paragraph in document.paragraphs:
while paragraph.style.name == 'Heading 2':
    print(paragraph.style.name)
    print(paragraph.text)

This basically doesn't work because I don't know how to accommodate the right logic. I'm using python docx library https://python-docx.readthedocs.io/en/latest/user/styles-using.html to iterate through the document's paragraphs.

Now, I want to split the list of paragraphs into sublists starting from every Heading 2, then adding all the next paragraphs with different paragraph.style.name until the next Heading 2 element, so that each chunk will contain one Heading 2 paragraph with its corresponding text.

In other words, I'm looking for a way to split the list into chunks from one element to another. Please help :)

Upvotes: 1

Views: 94

Answers (1)

C.Nivs
C.Nivs

Reputation: 13106

You could use an itertools.groupby to accomplish this:

from itertools import groupby

groups, next_group = [], []

for k, group in groupby(document.paragraphs, lambda x: x.style.name == 'Heading 2'):
    # If the predicate is True and next_group is populated,
    # we create a new chunk
    if k and next_group:
        groups.append(next_group)
        next_group = []

    # Fill up the current chunk
    for paragraph in group:
        # feel free to swap this out with a print statement
        # or whatever data structure suits you
        next_group.append({'style_name': paragraph.style.name, 'text': paragraph.text})

I'm using a list of dictionaries here for clarity, but you can substitute any data structure

Upvotes: 1

Related Questions