Holly
Holly

Reputation: 11

python newbie - where is my if/else wrong?

Complete beginner so I'm sorry if this is obvious!

I have a file which is name | +/- or IG_name | 0 in a long list like so -

S1      +
IG_1    0
S2      -
IG_S3   0
S3      +
S4      -
dnaA    +
IG_dnaA 0

Everything which starts with IG_ has a corresponding name. I want to add the + or - to the IG_name. e.g. IG_S3 is + like S3 is.

The information is gene names and strand information, IG = intergenic region. Basically I want to know which strand the intergenic region is on.

What I think I want:

open file
for every line, if the line starts with IG_*
    find the line with *
    print("IG_" and the line it found)
else 
    print line

What I have:

with open(sys.argv[2]) as geneInfo:
    with open(sys.argv[1]) as origin:
            for line in origin:
                    if line.startswith("IG_"):
                            name = line.split("_")[1]
                            nname = name[:-3]
                            for newline in geneInfo:
                                    if re.match(nname, newline):
                                            print("IG_"+newline)
                    else:
                            print(line)

where origin is the mixed list and geneInfo has only the names not IG_names.

With this code I end up with a list containing only the else statements.

S1  +

S2  -

S3  +

S4  -

dnaA    +

My problem is that I don't know what is wrong to search so I can (attempt) to fix it!

Upvotes: 0

Views: 110

Answers (3)

Steve
Steve

Reputation: 1282

Does this do what you want?

from __future__ import print_function

import sys

# Read and store all the gene info lines, keyed by name
gene_info = dict()                            
with open(sys.argv[2]) as gene_info_file:
    for line in gene_info_file:
        tokens = line.split()
        name = tokens[0].strip()
        gene_info[name] = line


# Read the other file and lookup the names
with open(sys.argv[1]) as origin_file:
    for line in origin_file:
        if line.startswith("IG_"):
                name = line.split("_")[1]
                nname = name[:-3].strip()
                if nname in gene_info:
                    lookup_line = gene_info[nname]
                    print("IG_" + lookup_line)
                else:
                    pass # what do you want to do in this case?
        else:
            print(line)

Upvotes: 0

xZeasy
xZeasy

Reputation: 132

 nname = name[:-3]

Python's slicing through list is very powerful, but can be tricky to understand correctly.

When you write [:-3], you take everything except the last three items. The thing is, if you have less than three element in your list, it does not return you an error, but an empty list.

I think this is where things does not work, as there are not much elements per line, it returns you an empty list. If you could tell what do you exactly want it to return there, with an example or something, it would help a lot, as i don't really know what you're trying to get with your slicing.

Upvotes: 0

roganjosh
roganjosh

Reputation: 13175

Below is some step-by-step annotated code that hopefully does what you want (though instead of using print I have aggregated the results into a list so you can actually make use of it). I'm not quite sure what happened with your existing code (especially how you're processing two files?)

s_dict = {}
ig_list = []

with open('genes.txt', 'r') as infile: # Simulating reading the file you pass in sys.argv
    for line in infile:
        if line.startswith('IG_'):
            ig_list.append(line.split()[0]) # Collect all our IG values for later
        else:
            s_name, value = line.split()    # Separate out the S value and its operator
            s_dict[s_name] = value.strip()  # Add to dictionary to map S to operator

# Now you can go back through your list of IG values and append the appropriate operator
pulled_together = [] 

for item in ig_list:
    s_value = item.split('_')[1]
    # The following will look for the operator mapped to the S value. If it is 
    # not found, it will instead give you 'not found'
    corresponding_operator = s_dict.get(s_value, 'Not found') 
    pulled_together.append([item, corresponding_operator])

print ('List structure')
print (pulled_together)
print ('\n')

print('Printout of each item in list')
for item in pulled_together:
    print(item[0] + '\t' + item[1])

Upvotes: 1

Related Questions