Reputation: 1316
I have the following file
cat file.txt
ID Location
MNS1 NC_000004.12:g.d.a144120555T>C;NC_001423.23:c.a144120513G<C
MNS2 NC_000142.12:g.a144120552C,N>D
MNS3 NC_000142.12:g.a144120559C>N
I would like to replace the input in this manner:
ID Location
MNS1 NC_000004.12:144120555;NC_001423.23:144120513
MNS2 NC_000142.12:144120552
MNS3 NC_000142.12:144120559
I would like to remove everything but numbers that appear between :
and ;
For example, I tried:
echo "NC_000004.12:g.d.a144120555T>C;" | sed 's/:[^0-9]*/:/g; s/[^0-9]*;/;/g; s/[^0-9]*$//g'
DESIRED OUTPUT
NC_000004.12:144120555
Upvotes: 0
Views: 90
Reputation: 91518
If Perl is an option for you:
cat file.txt
ID Location
MNS1 NC_000004.12:g.d.a144120555T>C;NC_001423.23:c.a144120513G<C
MNS2 NC_000142.12:g.a144120552C,N>D
MNS3 NC_000142.12:g.a144120559C>N
perl -ape 's/:\D+(\d+).*?(?=;|$)/:$1/g' file.txt
ID Location
MNS1 NC_000004.12:144120555;NC_001423.23:144120513
MNS2 NC_000142.12:144120552
MNS3 NC_000142.12:144120559
Explanation:
s/ # substitute
: # colon
\D+ # 1 or more non digits
(\d+) # group 1,, 1 or more digit
.*? # 0 or more any character but bewline, not greedy
(?=;|$) # positive lookahead, make sure we have semi-colon or end of line
/ # with
: # colon
$1 # content of group 1 (i.e. the digits)
/g # end, global
Upvotes: 1
Reputation: 817
This might do the trick!
sed -i.bak 's/g\.//g; s/\w>\w//g' filename
for (NC.*?):
concat, bit explanation about the end output will help , although this might work:
s/NC[0-9]?:/:/
Upvotes: 2