Grendel
Grendel

Reputation: 783

Remove part of a string with coordinates in python

Hello I have a list of tuple such as :

indexes_to_delete=((6,9),(20,22),(2,4))

and a sequence that I can open using Biopython :

Sequence1 = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

and from indexes_to_delete file I would like to remove the part from :

6 to 9
20 to 22

and

2 to 4

so if I follow these coordinate I should have a new_sequence :

A B C D E F G H I J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 

so if I remove the coordinates I get :

A E J  K  L  M  N  O  P  Q  R  S  W  X  Y  Z
1 5 10 11 12 13 14 15 16 17 18 19 23 24 25 26 

Upvotes: 1

Views: 161

Answers (4)

Chris Charley
Chris Charley

Reputation: 6613

Here is another approach using several modules.

from string import ascii_uppercase
from intspan import intspan
from operator import itemgetter

indexes_to_delete=((6,9),(20,22),(2,4))

# add dummy 'a' so count begins with 1 for uppercase letters
array = ['a'] + list(ascii_uppercase) 

indexes_to_keep = intspan.from_ranges(indexes_to_delete).complement(low = 1, high=26)

slice_of = itemgetter(*indexes_to_keep)

print(' '.join(slice_of(array)))
print(' '.join(map(str,indexes_to_keep)))

Prints:

A E J   K  L  M  N  O  P  Q  R  S  W  X  Y  Z
1 5 10 11 12 13 14 15 16 17 18 19 23 24 25 26

Upvotes: 1

Kirtiman Sinha
Kirtiman Sinha

Reputation: 1001

A bit more readable version:

indexes_to_delete=((6,9),(20,22),(2,4))
Sequence1 = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

newSequence1 = ""

for idx, char in enumerate(Sequence1):
  for startIndex, endIndex in indexes_to_delete:
    if startIndex <= idx+1 <= endIndex: 
      break
  else:
    newSequence1 += char

print(newSequence1)

Prints: AEJKLMNOPQRSWXYZ

Upvotes: 1

hitc
hitc

Reputation: 177

def delete_indexes(sequence, indexes_to_delete):
    # first convert the sequence to a dictionary
    seq_dict = {i+1: sequence[i] for i in range(len(sequence))}
    # collect all the keys that need to be removed
    keys_to_delete = []
    for index_range in indexes_to_delete:
        start, end = index_range
        keys_to_delete += range(start, end+1)
    if not keys_to_delete:
        return seq_dict
    # reomove the keys from the original dictionary
    for key in keys_to_delete:
        seq_dict.pop(key)
    return seq_dict

You can use this function to get the new sequence.

new_sequence = delete_indexes(Sequence1, indexes_to_delete)

Of course, the new_sequence is still a python dictionary. You can convert it to list or str, or whatever. For example, to convert it into a str as the old Sequence1:

print(''.join(list(new_sequence.values())))
Out[7]:
AEJKLMNOPQRSWXYZ

You can get their coordinates using new_sequence.keys().

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195543

indexes_to_delete=((6,9),(20,22),(2,4))
Sequence1 = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

s = ''.join(ch for i, ch in enumerate(Sequence1, 1) if not any(a <= i <= b for a, b in indexes_to_delete))

print(s)

Prints:

AEJKLMNOPQRSWXYZ

Upvotes: 2

Related Questions