Reputation: 11
I have a file with data like this. The '>' serves as identifier.
>test1
this is line 1
hi there
>test2
this is line 3
how are you
>test3
this is line 5 and
who are you
I'm trying to create a dictionary
{'>test1':'this is line 1hi there','>test2':'this is line 3how are you','>test3':'this is line 5who are you'}
I've imported the file but I'm unable to do it in this fashion. I want to delete the newline character at the end of each line so as to get one line. Spaces not required as seen. Any help would be appreciated
This is what I've tried so far
new_dict = {}
>>> db = open("/home/ak/Desktop/python_files/smalltext.txt")
for line in db:
if '>' in line:
new_dict[line]=''
else:
new_dict[line]=new_dict[line].append(line)
Upvotes: 0
Views: 123
Reputation: 35522
Here is a solution using groupby:
from itertools import groupby
kvs=[]
with open(f_name) as f:
for k, v in groupby((e.rstrip() for e in f), lambda s: s.startswith('>')):
kvs.append(''.join(v) if k else '\n'.join(v))
print {k:v for k,v in zip(kvs[0::2], kvs[1::2])}
The dict:
{'>test1': 'this is line 1\n\nhi there',
'>test2': 'this is line 3\n\nhow are you',
'>test3': 'this is line 5 and\n\nwho are you'}
Upvotes: 1
Reputation: 2295
Using your approach it would be:
new_dict = {}
>>> db = open("/home/ak/Desktop/python_files/smalltext.txt", 'r')
for line in db:
if '>' in line:
key = line.strip() #Strips the newline characters
new_dict[key]=''
else:
new_dict[key] += line.strip()
Upvotes: 3
Reputation: 103754
You can use a regex:
import re
di={}
pat=re.compile(r'^(>.*?)$(.*?)(?=^>|\Z)', re.S | re.M)
with open(fn) as f:
txt=f.read()
for k, v in ((m.group(1), m.group(2)) for m in pat.finditer(txt)):
di[k]=v.strip()
print di
# {'>test1': 'this is line 1\nhi there', '>test2': 'this is line 3\nhow are you', '>test3': 'this is line 5 and\nwho are you'}
Upvotes: 0