George
George

Reputation: 903

Within a loop print each element from a list on one line per loop

I have a file like this:

1:200-320    ['gene_id "xyz";transcript_id "xyzt"; exon_number "1"\n', 'gene_id "xyz";transcript_id "xyzt2"; exon_number "2"\n']
1:3000-3200    ['gene_id "xyz";transcript_id "xy"; exon_number "2"\n']

Extremely messy, I am looking to tidy it up by firstly grouping terms. i.e. Pull out the transcript_ids and have the written as transcript_id xyzt, xyzt2; And eventually repeat for all the terms there.

My approach to this was to first remove all the messy characters using replace

out=open('foo.txt','w')
with open('in.txt', 'r') as f:
    for line in f:
        tidyline = line.replace('[', "").strip()
        tidyline = tidyline.replace(']', "").strip()
        tidyline = tidyline.replace('"', "").strip()
        tidyline = tidyline.replace("'", "").strip()
        tidyline = tidyline.replace(",", "").strip()
        out.write("%s\n" %tidyline)

Then using re to try and match the strings and pull back this info. Which I can do, just not sure how to write to a file to keep them on the appropriate lines.

import re

with open('foo.txt', 'r') as f:
    for line in f:
        result = re.findall('transcript_id\s(\w+)',line)    
        print result
['xyzt', 'xyzt2']
['xy']

My idea was to do something like:

string= "transcript_id %s,%s" %(results[0], results[1])
file.write("%s\n" %string)

but because all the list for each line are different lengths that doesn't work.

Upvotes: 0

Views: 92

Answers (2)

Sergius
Sergius

Reputation: 986

You can put all results in one list and then go through it:

transcript_id_list = []
with open('foo.txt', 'r') as f:
    for line in f:
        result = re.findall('transcript_id.*?(\w+)',line)
        if result:
            transcript_id_list.extend(result)

for item in transcript_id_list:
    string= "transcript_id %s" % item
    file.write("%s\n" % string)

Upvotes: 0

m00am
m00am

Reputation: 6298

The last of your problems (writing the lists of variable lengths) can be solved using the join method of string. Try this:

s = "transcript_id " + ",".join(results)

To be on the save side concerning your file operations you should move the opening of the out-file to the with-statement, to avoid retaining unclosed files:

with open('in.txt', 'r') as f, open('foo.txt','w') as out:
    ...

Do you really need the in between step of writing the foo.txt or is this just a workaournd?

I hope this helps.

Upvotes: 1

Related Questions