Reputation: 3022
In the tab-delimited
file below I am trying to remove the text after the last digit in $1
. I have tried twosed
commands and gotten close but not the desired output. I don't know if I am using the best approach. Thank you :).
file
chr7:55249071C>T EGFR
chr7:55242469_55242477delTTAAGAGAAG EGFR
desired output
chr7:55249071 EGFR
chr7:55242469_55242477 EGFR
sed
sed 's/[0-9]//g' file
chr:C>T EGFR
chr:_delTTAAGAGAAG EGFR
sed 's/[a-z]//g' file
7:55249071C>T EGFR
7:55242469_55242477TTAAGAGAAG EGFR
Upvotes: 1
Views: 211
Reputation: 58371
This might work for you (GNU sed):
sed 's/\(.*[0-9]\)\S\+/\1/' file
Match up to the last numeric digit and store as a back reference and remove any non-space characters following it.
Upvotes: 1
Reputation: 50750
If it's guaranteed that your input has only two tab delimited fields you can use this:
sed 's/[^0-9]\+\t/\t/' file
Upvotes: 1
Reputation: 23667
You can use negated character class and anchoring to delete only at end of string
$ awk 'BEGIN{FS=OFS="\t"} {sub(/[^0-9]+$/, "", $1)} 1' ip.txt
chr7:55249071 EGFR
chr7:55242469_55242477 EGFR
BEGIN{FS=OFS="\t"}
to set input and output field delimiter as tabsub(/[^0-9]+$/, "", $1)
to perform substitution only for first field, this makes it much easier to adapt for different fields compared to sed
1
idiomatic way to print contents of $0
Upvotes: 2