NihilSupernum
NihilSupernum

Reputation: 3

Creating a dictionary from FASTA file

I have a file that looks like this:

%Labelinfo

string1

string2

%Labelinfo2

string3

string4

string5

I would like to create dictionary that has key a string that is %Labelinfo, and value that is a concatenation of strings from one Labelinfo to next. Basically this :

{%Labelinfo : string1+string2 , %Labelinfo : string2+string3+string4}

Problem is that there can be any number of lines between two "Labelinfo" lines. For example, between %Labelinfo to %Labelinfo2 can be 5 lines. Then, between %Labelinfo2 to %Labelinfo3 can be, let's say 4 lines.

However, the line that containes "Labelinfo" always starts with the same character, for example %.

How do I solve this problem?

Upvotes: 0

Views: 1980

Answers (3)

LetzerWille
LetzerWille

Reputation: 5658

import re

d = {}

text = open('fasta.txt').read()

for el in [ x for x in re.split(r'\s+', text) if x]:

if el.startswith('%'):
    key = el
    d[key] = ''
else:
    value = d[key] + el
    d[key] = value

print(d)

{'%Labelinfo': 'string1string2', '%Labelinfo2': 'string3string4string5'}

Upvotes: 0

黄哥Python培训
黄哥Python培训

Reputation: 249

#!/usr/bin/env python
# coding:utf-8
'''黄哥Python'''

d = {}

with open('Labelinfo.txt') as f:
    for line in f:
        if len(line) > 1:
            if '%Labelinf' in line:
                key = line.strip()
                d[key] = ""
            else:
                d[key] += line.strip() + "+"

d = {key: d[key][:-1] for key in d}
print d

{'%Labelinfo2': 'string3+string4+string5', '%Labelinfo': 'string1+string2'}

Upvotes: 1

Neil
Neil

Reputation: 14313

Here's how I would write it:

The program loops through every line in the file. Checks to see if that line is empty, if it is, ignore it. If it isn't empty, then we process the line. Anything with a % at the start denotes a variable, so let's go ahead and add that to the dictionary and set that to a variable, current. Then we keep on adding to the dictionary at key current, until the next %

di = {}
with open("fasta.txt","r") as f:
    current = ""
    for line in f:
        line = line.strip()
        if line == "":
            continue
        if line[0] == "%":
            di[line] = ""
            current = line
        else:
            if di[current] == "":
                di[current] = line
            else:
                di[current] += "+" + line
print(di)

Output:

{'%Labelinfo2': 'string3+string4+string5', '%Labelinfo': 'string1+string2'}

Note: Dictionaries do not enforce error, so they will be out of order; but stil accessible in the same way. And, just a heads up, your example output is slightly wrong, you forgot to put in the 2 after one of the %Labelinfo.

Upvotes: 0

Related Questions