Reputation: 861
I have text file with the format below. We have multiple "context" parts with text consisting of multiple lines and topic (a one line topic). Then multiple questions with different ids about the context paragraph. I want to store context in a list. Where each context is an element of the list. My method was to take all the lines between lines that start with "context" and starts with "topic". However, once I set the condition that I want the lines between context and topic I can not join the different contexts into same string. Below is my code.
context :
|
topic:
|
question:
answer:
id:
|
question:
answer:
id:
|
context:
|
topic:
|
question:
answer:
id:
.
.
.
context = []
f = open("example.txt","r")
context_line = True
for line in f:
if not line.strip():
continue
str1 = ""
if line.startswith("context"):
context_line = True
elif line.startswith("topic"):
context_line = False
if context_line:
# Here how can I join the lines?
str1 += line.rstrip("\n").lstrip("\ufeff").strip("|")
context.append(str1)
Upvotes: 0
Views: 702
Reputation: 1653
You can keep track of all the lines in the context and join them when the topic part starts:
context = []
f = open("example.txt","r")
context_line = True
for line in f:
if not line.strip():
continue
if line.startswith("context"):
context_line = True
str1 = []
elif line.startswith("topic"):
lines = ' '.join(str1) # here you can choose how to join the lines
context.append(lines)
context_line = False
if context_line:
str1.append(line.rstrip("\n").lstrip("\ufeff").strip("|"))
On a side note, just be aware that this method doesn't make any check that the input files are correctly formatted. In particular, if a context
section is not immediately followed by a topic
section, it will not work as intended.
Upvotes: 1