Reputation: 3
I have a file that looks like this:
%Labelinfo
string1
string2
%Labelinfo2
string3
string4
string5
I would like to create dictionary that has key a string that is %Labelinfo, and value that is a concatenation of strings from one Labelinfo to next. Basically this :
{%Labelinfo : string1+string2 , %Labelinfo : string2+string3+string4}
Problem is that there can be any number of lines between two "Labelinfo" lines. For example, between %Labelinfo to %Labelinfo2 can be 5 lines. Then, between %Labelinfo2 to %Labelinfo3 can be, let's say 4 lines.
However, the line that containes "Labelinfo" always starts with the same character, for example %.
How do I solve this problem?
Upvotes: 0
Views: 1980
Reputation: 5658
import re
d = {}
text = open('fasta.txt').read()
for el in [ x for x in re.split(r'\s+', text) if x]:
if el.startswith('%'):
key = el
d[key] = ''
else:
value = d[key] + el
d[key] = value
print(d)
{'%Labelinfo': 'string1string2', '%Labelinfo2': 'string3string4string5'}
Upvotes: 0
Reputation: 249
#!/usr/bin/env python
# coding:utf-8
'''黄哥Python'''
d = {}
with open('Labelinfo.txt') as f:
for line in f:
if len(line) > 1:
if '%Labelinf' in line:
key = line.strip()
d[key] = ""
else:
d[key] += line.strip() + "+"
d = {key: d[key][:-1] for key in d}
print d
{'%Labelinfo2': 'string3+string4+string5', '%Labelinfo': 'string1+string2'}
Upvotes: 1
Reputation: 14313
Here's how I would write it:
The program loops through every line in the file. Checks to see if that line is empty, if it is, ignore it. If it isn't empty, then we process the line. Anything with a %
at the start denotes a variable, so let's go ahead and add that to the dictionary and set that to a variable, current
. Then we keep on adding to the dictionary at key current
, until the next %
di = {}
with open("fasta.txt","r") as f:
current = ""
for line in f:
line = line.strip()
if line == "":
continue
if line[0] == "%":
di[line] = ""
current = line
else:
if di[current] == "":
di[current] = line
else:
di[current] += "+" + line
print(di)
Output:
{'%Labelinfo2': 'string3+string4+string5', '%Labelinfo': 'string1+string2'}
Note: Dictionaries do not enforce error, so they will be out of order; but stil accessible in the same way. And, just a heads up, your example output is slightly wrong, you forgot to put in the 2
after one of the %Labelinfo.
Upvotes: 0