Reputation: 27
Split string after every 8 words. If the 8th word doesn't have a (. or !), move to the next word that does.
I can split the words from the string.
with open("file.txt") as c:
for line in c:
text = line.split()
n = 8
listword = [' '.join(text[i:i+n]) for i in range(0,len(text),n)]
for lsb in listword:
print(lsb)
The expected output should be
I'm going to the mall for breakfast, Please meet me there for lunch.
The duration of the next. He figured I was only joking!
I brought back the time.
This is what I'm getting
I'm going to the mall for breakfast, Please
meet me there for lunch. The duration of
the next. He figured I was only joking!
I brought back the time.
Upvotes: 1
Views: 1362
Reputation: 5508
It doesn't look like you've told your code to look for .
or !
, only to split the text into 8-word chunks. Here's one solution:
buffer = []
output = []
with open("file.txt") as c:
for word in c.split(" "):
buffer.append(word)
if '!' in word or '.' in word and len(buffer) > 7:
output.append(' '.join(buffer))
buffer = []
print output
This takes in a list of words, split at the spaces. It adds word
s to a buffer
until your conditions are met (word
contains punctuation and the buffer is longer than 7 words). Then it appends that buffer
to your output
and clears the buffer
.
I don't know how your file is structured, so I tested with c
as a long string of sentences. You might have to do some fiddling with the input to get it to come in the way this code is expecting.
Upvotes: 1
Reputation: 1279
As you probably know, you haven't coded anything to check for punctuation. The best way to do this might be using two indexes to keep track of the start and end of the section you want to print. The section must be at least 8 words, but larger if punctuation is not found on the 8th word.
n = 8
with open('file.txt') as c:
for line in c:
words = line.split()
# Use two indexes to keep track of which section to print
start = 0
end = start + n
while end < len(words):
# At the last word of this section, if punctuation not found, advance end until punctuation found
if '.' not in words[end - 1] and '!' not in words[end - 1]:
for word in words[end:]:
if '.' in word or '!' in word:
break
end += 1
print(' '.join(words[start:end + 1])) # print from start to end, including word at end
start = end + 1 # advance start to one after last word
end += n # advance end 8 more words
print(' '.join(words[start:end])) # print the last section regardless of punctuation
Result:
I'm going to the mall for breakfast, Please meet me there for lunch.
The duration of the next. He figured I was only joking!
I brought back the time.
Upvotes: 0
Reputation: 17247
You are adding line breaks to a sequence of words. The main condition for a line break is that the last word ends with a .
or !
. Pluse there is a secondary condition about the minimum length (8 words or more). The following code gathers the words in a buffer until the condition to print a line is satisfied.
with open("file.txt") as c:
out = []
for line in c:
for word in line.split():
out.append(word)
if word.endswith(('.', '!')) and len(out) >= 8:
print(' '.join(out))
out.clear()
# don't forget to flush the buffer
if out:
print(' '.join(out))
Upvotes: 1
Reputation: 1039
I am not sure how to achieve that with a list of comprehension, but you could try to make it done with regular for loop.
with open("file.txt") as c:
for line in c:
text = line.split()
n = 8
temp = []
listword = []
for val in text:
if len(temp) < n or (not val.endswith('!') and not val.endswith('.')):
temp.append(val)
else:
temp.append(val)
listword.append(' '.join(temp))
temp = []
if temp: # if last line has less than 'n' words, it will append last line
listword.append(' '.join(temp))
for lsb in listword:
print(lsb)
Upvotes: 0