Reputation: 394
I have a sample text file with following columns
scff2 54 92 aa_bb_c4_1024_0_2 scff2 30 18 aa_bb_c4_1024_0_2
scff8 80 96 aa_bb_c4_24_0_2 scff8 14 42 aa_bb_c4_24_0_2
scff1 20 25 aa_bb_c4_98_0_1 scff4 11 25 aa_bb_c4_13_0_1
scff6 16 61 aa_bb_c4_84_0_1 scff6 15 16 aa_bb_c4_84_0_2
I would like remove the last characters in the column 4 and column 8 like following using awk
scff2 54 92 aa_bb_c4_1024_0 scff2 30 18 aa_bb_c4_1024_0
scff8 80 96 aa_bb_c4_24_0 scff8 14 42 aa_bb_c4_24_0
scff1 20 25 aa_bb_c4_98_0 scff4 11 25 aa_bb_c4_13_0
scff6 16 61 aa_bb_c4_84_0 scff6 15 16 aa_bb_c4_84_0
I tried using following script sed -i.bak 's/_[0-9]*$//' sample.txt
but it did remove the characters after the last underscore in 8th column but not in the 4th column. Can some one can guide me in achieving my desired output. Thanks in advance.
Upvotes: 0
Views: 56
Reputation: 1517
awk '{gsub(/_0_./,"_0")}1' file
scff2 54 92 aa_bb_c4_1024_0 scff2 30 18 aa_bb_c4_1024_0
scff8 80 96 aa_bb_c4_24_0 scff8 14 42 aa_bb_c4_24_0
scff1 20 25 aa_bb_c4_98_0 scff4 11 25 aa_bb_c4_13_0
scff6 16 61 aa_bb_c4_84_0 scff6 15 16 aa_bb_c4_84_0
Upvotes: 0
Reputation: 203502
It looks like all you need is:
$ sed 's/_[0-9]\( \|$\)/\1/g' file
scff2 54 92 aa_bb_c4_1024_0 scff2 30 18 aa_bb_c4_1024_0
scff8 80 96 aa_bb_c4_24_0 scff8 14 42 aa_bb_c4_24_0
scff1 20 25 aa_bb_c4_98_0 scff4 11 25 aa_bb_c4_13_0
scff6 16 61 aa_bb_c4_84_0 scff6 15 16 aa_bb_c4_84_0
or if your sed supports -E
to enable EREs (which I expect yours does since you're using -i
):
$ sed -E 's/_[0-9]( |$)/\1/g' file
scff2 54 92 aa_bb_c4_1024_0 scff2 30 18 aa_bb_c4_1024_0
scff8 80 96 aa_bb_c4_24_0 scff8 14 42 aa_bb_c4_24_0
scff1 20 25 aa_bb_c4_98_0 scff4 11 25 aa_bb_c4_13_0
scff6 16 61 aa_bb_c4_84_0 scff6 15 16 aa_bb_c4_84_0
or as @GlennJackman pointed out in the comments, with GNU sed (the above would work with other seds too, e.g. OSX sed), it'd be:
sed 's/_[0-9]\>//g'
Upvotes: 3
Reputation: 37404
In GNU awk, everything ending in `_[0-9]+' removed:
$ awk '{gsub(/_[0-9]+\>/,"")}1' file
scff2 54 92 aa_bb_c4_1024_0 scff2 30 18 aa_bb_c4_1024_0
scff8 80 96 aa_bb_c4_24_0 scff8 14 42 aa_bb_c4_24_0
...
Upvotes: 0
Reputation: 185106
Sometimes it's useful to store the result of a substitution in gawk :
$ awk '{$4=gensub(/_[0-9]$/, "", 1, $4); $8=gensub(/_[0-9]$/, "", 1, $8)}1' file
scff2 54 92 aa_bb_c4_1024_0 scff2 30 18 aa_bb_c4_1024_0
scff8 80 96 aa_bb_c4_24_0 scff8 14 42 aa_bb_c4_24_0
scff1 20 25 aa_bb_c4_98_0 scff4 11 25 aa_bb_c4_13_0
scff6 16 61 aa_bb_c4_84_0 scff6 15 16 aa_bb_c4_84_0
But @Barmar solution is smarter/shorter/lighter
Not in all awk
implementations : not nawk
, need GNU awk
or maybe mawks
Upvotes: 2
Reputation: 780984
You can use sub()
in awk
to perform a substitution in a specific field.
awk '{sub(/_[0-9]*$/, "", $4); sub(/_[0-9]*$/, "", $8); print}' sample.txt
Upvotes: 3