Sam Vanbergen
Sam Vanbergen

Reputation: 145

determine whether combination of protein fragments might cover a complete protein sequence

A FASTA file contains a single protein sequence. A second FASTA file contains sequences that are fragments of the sequence in the first file. Compute the molecular weight of each of the sequences, and using those, determine whether there is a combination of fragments that may cover the complete protein sequence, without those fragments overlapping

I've tried to make the following script, but I've not been able to put it all into a functioning code

So in

seqs

I've put the weight of the protein fragments, and in

total_weight

I've put the weight of the complete fragment, to test if the body I'm trying to use functions.

seqs = [50,70,30]
total_weight = 100
current_weight = 0
for weight in seqs:
    if current_weight + weight == total_weight:
        print(True)
    elif current_weight + weight < total_weight:
        current_weight += weight
    if current_weight > total_weight:
        current_weight -= weight

Obviously in this case I would want this code to return True. In order to solve this I'd like to omit the first element in the

seqs

list and then redo the 'for' loop I've made. Somehow I've not been able to complete the code by omitting the first element and running for loop again for the new

seqs

list. Can someone guide me in the right direction?

Upvotes: 1

Views: 316

Answers (3)

Drees
Drees

Reputation: 720

Here is another recursion method that actually gives you any values in your list add up to 100, and will print out the new list, the statement True

seqs = [50,70,30]
total_weight = 100

def protein_summation_check(target, lst, newLst=[]):
    print(newLst)
    for index,protein in enumerate(lst):
        newLst.append(protein)
        protein_summation_check(target, lst[index+1:], newLst)
        if sum(newLst) == target:
            return ("True",newLst)
        newLst.pop()
    else:
        return False
print(protein_summation_check(total_weight, seqs))

For loop iteration that doesn't really work for all solutions, but will for the one you provided;

seqs = [50,70,30]
total_weight = 100
current_weight = 0

for index, item in enumerate(seqs):
    if  current_weight == total_weight or item == total_weight:
        print("True")
        break
    for otheritem in seqs[index+1:]:
        if otheritem == total_weight:
            current_weight = total_weight
            break
        if current_weight < total_weight:
            current_weight += otheritem + item
        if current_weight > total_weight:
            if otheritem >= total_weight:
                current_weight -= item
            else:
                current_weight -= otheritem

Upvotes: 2

IQbrod
IQbrod

Reputation: 2265

Your code obviously won't print True as

0 + 50 = 50
50 & 70 => Nothing happens
50 + 30 = 80

For each entry you might try to add next ones or don't, so your function will have two parameters, what's already grouped and the rest :

def calculate(current: int, next: int[]):
  pass

You want to check if the current element is your total weight and if nothing can be added you won't get any further

total_weight=100
current_weight=0
data=[50,70,30]

def calculate(current: int, next: int[]):
  if(current == total_weight):
    return True
  if(not next):
    return False

Now you check wether one of you calculation will result to your total

def calculate(current: int, next: int[]):
  if(current == total_weight):
    return True
  if(not next):
    return False
  #Edit: x does not require to be calculated in every cases
  x = False
  if current+ next[0] <= total_weight:
    x = calculate(current+ next[0], next[1:]) #with
  y = calculate(current, next[1:]) #without
  return x or y

print(calculate(current_weight, data))

You might need thread to perform quicker and abort next calculation steps on big datasets

Upvotes: 1

Steven Barnard
Steven Barnard

Reputation: 604

Look into using permutaitons from itertools on your seq list:

from itertools import permutations 
perm_list = list(permutations(seqs))
perm_list

provides the following output:

[(50, 70, 30),
 (50, 30, 70),
 (70, 50, 30),
 (70, 30, 50),
 (30, 50, 70),
 (30, 70, 50)]

Then you can loop over those combinations indecies to see which values may equal the total weight.

Hope this is of use, Cheers!

Upvotes: 1

Related Questions