Reputation: 565
I have a text file with the following structure:
name1:
sentence. [sentence. ...] # can be one or more
name2:
sentence. [sentence. ...]
EDIT Input sample:
Djohn:
Hello. I am Djohn
I am Djohn.
Bot:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim
veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
est laborum.
Ninja:
Hey guys!! wozzup
EDIT 2 Input sample:
This is example sentence that can come before first speaker.
Djohn:
Hello. I am Djohn
I am Djohn.
Bot:
Yes, I understand, don't say it twice lol
Ninja:
Hey guys!! wozzup
Each item (name or sentence(s) is an Unicode string. I put this data into list, and want to form a dictionary:
{
'name1': [[sentence.], ..]
'name2': [[sentence.], ..]
}
EDIT 3
The dictionary I am building intended to be written into a file and it is bunch of Unicode strings.
What I am trying to do is this:
for i, paragraph in enumerate(paragraphs): # paragraphs is the list
# with Unicode strings
if isParagraphEndsWithColon(paragraph):
name = paragraph
text = []
for p in range(paragraphs[i], paragraphs[-1]):
if isParagraphEndsWithColon(p):
break
localtext.extend(p)
# this is output dictionary I am trying to build
outputDocumentData[name].extend(text)
E.g. I need to make a nested loop from the found 'name:' sentence until the next one, while extending the list of sentences for the same key (which is name). The thing is range() don't work here for me, because it expects integers.
Looking for "pythonic" way to make nested loop from the current element to the end of the list. (feels like making slice of the list each iteration will be inefficient)
Upvotes: 0
Views: 126
Reputation: 61930
You could use groupby:
from itertools import groupby
lines = ["Djohn:",
"Hello. I am Djohn",
"I am Djohn.",
"Bot:",
"Yes, I understand, don't say it twice lol",
"Ninja:",
"Hey guys!! wozzup"]
name = ''
result = {}
for k, v in groupby(lines, key= lambda x: x.endswith(':')):
if k:
name = ''.join(v).lstrip(':')
else:
result.setdefault(name, []).extend(list(v))
print(result)
Output
{'Djohn:': ['Hello. I am Djohn', 'I am Djohn.'], 'Ninja:': ['Hey guys!! wozzup'], 'Bot:': ["Yes, I understand, don't say it twice lol"]}
The idea is to group the input into name line, not name line so you use as key lambda x: x.endswith(':')
.
Upvotes: 3