Reputation: 27

Python split string to next period punctuation

Split string after every 8 words. If the 8th word doesn't have a (. or !), move to the next word that does.

I can split the words from the string.

with open("file.txt") as c:
    for line in c:
        text = line.split()
        n = 8
        listword = [' '.join(text[i:i+n]) for i in range(0,len(text),n)]
        for lsb in listword:
            print(lsb)

The expected output should be

I'm going to the mall for breakfast, Please meet me there for lunch. 
The duration of the next. He figured I was only joking!
I brought back the time.

This is what I'm getting

I'm going to the mall for breakfast, Please
meet me there for lunch. The duration of 
the next. He figured I was only joking!
I brought back the time.

Upvotes: 1

Answers (4)

Abe

Reputation: 5508

It doesn't look like you've told your code to look for . or !, only to split the text into 8-word chunks. Here's one solution:

buffer = []
output = []

with open("file.txt") as c:
    for word in c.split(" "):
        buffer.append(word)
        if '!' in word or '.' in word and len(buffer) > 7:
            output.append(' '.join(buffer))
            buffer = []

print output

This takes in a list of words, split at the spaces. It adds words to a buffer until your conditions are met (word contains punctuation and the buffer is longer than 7 words). Then it appends that buffer to your output and clears the buffer.

I don't know how your file is structured, so I tested with c as a long string of sentences. You might have to do some fiddling with the input to get it to come in the way this code is expecting.

Upvotes: 1

Endyd

Reputation: 1279

As you probably know, you haven't coded anything to check for punctuation. The best way to do this might be using two indexes to keep track of the start and end of the section you want to print. The section must be at least 8 words, but larger if punctuation is not found on the 8th word.

n = 8
with open('file.txt') as c:
    for line in c:
        words = line.split()

        # Use two indexes to keep track of which section to print
        start = 0
        end = start + n
        while end < len(words):
            # At the last word of this section, if punctuation not found, advance end until punctuation found
            if '.' not in words[end - 1] and '!' not in words[end - 1]:
                for word in words[end:]:
                    if '.' in word or '!' in word:
                        break
                    end += 1
            print(' '.join(words[start:end + 1])) # print from start to end, including word at end
            start = end + 1 # advance start to one after last word
            end += n # advance end 8 more words
        print(' '.join(words[start:end])) # print the last section regardless of punctuation

Result:

I'm going to the mall for breakfast, Please meet me there for lunch.
The duration of the next. He figured I was only joking!
I brought back the time.

Upvotes: 0

VPfB

Reputation: 17247

You are adding line breaks to a sequence of words. The main condition for a line break is that the last word ends with a . or !. Pluse there is a secondary condition about the minimum length (8 words or more). The following code gathers the words in a buffer until the condition to print a line is satisfied.

with open("file.txt") as c:
    out = []
    for line in c:
        for word in line.split():
            out.append(word)
            if word.endswith(('.', '!')) and len(out) >= 8:
                print(' '.join(out))
                out.clear()
    # don't forget to flush the buffer
    if out:
        print(' '.join(out))

Upvotes: 1

Relandom

Reputation: 1039

I am not sure how to achieve that with a list of comprehension, but you could try to make it done with regular for loop.

with open("file.txt") as c:
    for line in c:
        text = line.split()
        n = 8
        temp = []
        listword = []
        for val in text:
            if len(temp) < n or (not val.endswith('!') and not val.endswith('.')):
              temp.append(val)
            else:
                temp.append(val)
                listword.append(' '.join(temp))
                temp = []
        if temp:  # if last line has less than 'n' words, it will append last line
            listword.append(' '.join(temp))

for lsb in listword:
    print(lsb)

Upvotes: 0

Python split string to next period punctuation

Answers (4)

Related Questions