user587646
user587646

Reputation: 117

How do I remove 2nd and rest digits after a period from column one of lines?

How do I remove 2nd and rest digit after the period from column one?
For example,

HP_000083.21423      N  -1  NO  99.8951%    0.000524499999999983
NP_075561.1_1908    N   -1  NO  99.9697%    0.000151499999999971

I would like to remove "_1908" from "NP_075561.1_1908"

and "1423 from "HP_000083.21423"

without removing other items from the subsequent columns.

Expected row would be:

HP_000083.2         N          -1       NO        99.8951%  0.000524499999999983
NP_075561.1             N           -1      NO        99.9697%  0.000151499999999971

Here's my code: Some of you had provided part of this solution in the past.

    for line in fname:
        line = re.sub('[\(\)\{\}\'\'\,<>]','', line)
        line = re.sub(r"(\.\d+)_\d+", r"\1", line) 
        fields = line.rstrip("\n").split()
       outfile.write('%s  %s  %s  %s  %s  %s\n' % (fields[0],fields[1],fields[2],fields[3],fields[4],(fields[5])))

Thanks in advance guys, Cheers,

Upvotes: 0

Views: 141

Answers (2)

Andrew Clark
Andrew Clark

Reputation: 208475

Here is a solution with a pretty minimal change to the code you provided:

for line in fname:
    line = re.sub('[\(\)\{\}\'\'\,<>]','', line)
    line = re.sub(r"(\.\d)\d*_?\d*", r"\1", line, 1)
    fields = line.rstrip("\n").split()
    outfile.write('%s  %s  %s  %s  %s  %s\n' % (fields[0],fields[1],fields[2],fields[3],fields[4],(fields[5])))

Upvotes: 0

Sven Marnach
Sven Marnach

Reputation: 601599

I'd avoid using regular expressions in this case. You can easily make do with standard string methods:

for line in infile:
    first_col, rest = line.split(" ", 1)
    first_col = first_col[:first_col.index(".") + 2]
    output_line = str.join(" ", (first_col, rest))
    outfile.write(output_line)

Upvotes: 5

Related Questions