Wilson Mak
Wilson Mak

Reputation: 147

Python find and replace upon condition / with a function

String = n76a+q80a+l83a+i153a+l203f+r207a+s211a+s215w+f216a+e283l

I want the script to look at a pair at a time meaning:

evaluate n76a+q80a. if abs(76-80) < 10, then replace '+' with a '_': else don't change anything. Then evaluate q80a+l83a next and do the same thing.

The desired output should be:

n76a_q80a_l83a+i153a+l203f_r207a_s211a_s215w_f216a+e283l

What i tried is,

def aa_dist(x):
if abs(int(x[1:3]) - int(x[6:8])) < 10:
    print re.sub(r'\+', '_', x)

with open(input_file, 'r') as alex:
    oligos_list = alex.read()
    aa_dist(oligos_list)

This is what I have up to this point. I know that my code will just replace all '+' into '_' because it only evaluates the first pair and and replace all. How should I do this?

Upvotes: 0

Views: 62

Answers (2)

Avinash Raj
Avinash Raj

Reputation: 174706

Through re module only.

>>> s = 'n76a+q80a+l83a+i153a+l203f+r207a+s211a+s215w+f216a+e283l'
>>> m = re.findall(r'(?=\b([^+]+\+[^+]+))', s)               # This regex would helps to do a overlapping match. See the  demo (https://regex101.com/r/jO6zT2/13)
>>> m
['n76a+q80a', 'q80a+l83a', 'l83a+i153a', 'i153a+l203f', 'l203f+r207a', 'r207a+s211a', 's211a+s215w', 's215w+f216a', 'f216a+e283l']
>>> l = []
>>> for i in m:
        if abs(int(re.search(r'^\D*(\d+)', i).group(1)) -    int(re.search(r'^\D*\d+\D*(\d+)', i).group(1))) < 10:
            l.append(i.replace('+', '_'))
        else:
            l.append(i)
>>> re.sub(r'([a-z0-9]+)\1', r'\1',''.join(l))
'n76a_q80a_l83a+i153a+l203f_r207a_s211a_s215w_f216a+e283l'

By defining a separate function.

import re
def aa_dist(x):
    l = []
    m = re.findall(r'(?=\b([^+]+\+[^+]+))', x)
    for i in m:
        if abs(int(re.search(r'^\D*(\d+)', i).group(1)) - int(re.search(r'^\D*\d+\D*(\d+)', i).group(1))) < 10:
            l.append(i.replace('+', '_'))
        else:
            l.append(i)
    return re.sub(r'([a-z0-9]+)\1', r'\1',''.join(l))

string = 'n76a+q80a+l83a+i153a+l203f+r207a+s211a+s215w+f216a+e283l'
print  aa_dist(string)  

Output:

n76a_q80a_l83a+i153a+l203f_r207a_s211a_s215w_f216a+e283l

Upvotes: 1

Joran Beasley
Joran Beasley

Reputation: 113988

import itertools,re

my_string =  "n76a+q80a+l83a+i153a+l203f+r207a+s211a+s215w+f216a+e283l"
#first extract the numbers    
my_numbers = map(int,re.findall("[0-9]+",my_string))
#split the string on + (useless comment)
parts = my_string.split("+")

def get_filler((a,b)):
    '''this method decides on the joiner'''
    return "_" if abs(a-b) < 10 else '+'

fillers = map(get_filler,zip(my_numbers,my_numbers[1:])) #figure out what fillers we need
print "".join(itertools.chain.from_iterable(zip(parts,fillers)))+parts[-1] #it will always skip the last part so gotta add it

is one way you might accomplish this... and is also an example of worthless comments

Upvotes: 2

Related Questions