justaguy
justaguy

Reputation: 3022

awk or sed to remove text after last digit in field

In the tab-delimited file below I am trying to remove the text after the last digit in $1. I have tried twosed commands and gotten close but not the desired output. I don't know if I am using the best approach. Thank you :).

file

chr7:55249071C>T    EGFR
chr7:55242469_55242477delTTAAGAGAAG EGFR

desired output

chr7:55249071   EGFR
chr7:55242469_55242477  EGFR

sed

sed 's/[0-9]//g' file

chr:C>T EGFR
chr:_delTTAAGAGAAG  EGFR

sed 's/[a-z]//g' file

7:55249071C>T   EGFR
7:55242469_55242477TTAAGAGAAG   EGFR

Upvotes: 1

Views: 211

Answers (3)

potong
potong

Reputation: 58371

This might work for you (GNU sed):

sed 's/\(.*[0-9]\)\S\+/\1/' file

Match up to the last numeric digit and store as a back reference and remove any non-space characters following it.

Upvotes: 1

oguz ismail
oguz ismail

Reputation: 50750

If it's guaranteed that your input has only two tab delimited fields you can use this:

sed 's/[^0-9]\+\t/\t/' file

Upvotes: 1

Sundeep
Sundeep

Reputation: 23667

You can use negated character class and anchoring to delete only at end of string

$ awk 'BEGIN{FS=OFS="\t"} {sub(/[^0-9]+$/, "", $1)} 1' ip.txt
chr7:55249071   EGFR
chr7:55242469_55242477  EGFR
  • BEGIN{FS=OFS="\t"} to set input and output field delimiter as tab
  • sub(/[^0-9]+$/, "", $1) to perform substitution only for first field, this makes it much easier to adapt for different fields compared to sed
  • 1 idiomatic way to print contents of $0

Upvotes: 2

Related Questions