Reputation: 731
I have an external file that I need to have into a dictionary. Each article begins with <NEW DOCUMENT>
, and I am unaware of how I can pull all the information from the file, starting on the line below <newdoc>
and ending before reaching the other <newdoc>
. Here is what I have so far.
for line in file2:
line = line.strip()
line_list = line.split()
if "NEW DOCUMENT" in line:
doc_num+=1
new_dict[doc_num] = line
print(new_dict)
The file looks like this.
<NEW DOCUMENT>
Look on the bright
side of Life.
<NEW DOCUMENT>
look on the very, dark
side of the Moon
Upvotes: 0
Views: 310
Reputation: 50200
This'll do it for you:
docs = file2.read().split("<NEW DOCUMENT>\n")
It gives you a list, not a dictionary, because why would you want a dictionary whose keys are sequential numbers? But if you must have a dictionary, use:
new_dict = dict(enumerate(docs))
Upvotes: 0
Reputation: 97601
Here's a modification to your solution:
docs = []
document = []
for line in file2:
line = line.strip()
if line == "<NEW DOCUMENT>":
# start a new document
document = []
docs.append(document)
else:
# append to the current one
document.append(line)
# convert lists of lines into a string
docs = ['\n'.join(document) for document in docs]
Upvotes: 2
Reputation: 250971
something like this:
In [7]: with open("data1.txt") as f:
data=f.read()
dic=dict((i,x.strip()) for i,x in enumerate(data.split("<NEW DOCUMENT>")[1:]))
print dic
....:
....:
{0: 'Look on the bright \nside of Life.', 1: 'look on the very, dark\nside of the Moon'}
Upvotes: 0