Reputation: 394

Deleting the last characters in the specific columns

I have a sample text file with following columns

scff2  54   92   aa_bb_c4_1024_0_2 scff2   30  18   aa_bb_c4_1024_0_2
scff8  80   96   aa_bb_c4_24_0_2   scff8   14  42   aa_bb_c4_24_0_2
scff1  20   25   aa_bb_c4_98_0_1   scff4   11  25   aa_bb_c4_13_0_1
scff6  16   61   aa_bb_c4_84_0_1   scff6   15  16   aa_bb_c4_84_0_2

I would like remove the last characters in the column 4 and column 8 like following using awk

scff2  54   92   aa_bb_c4_1024_0 scff2   30  18   aa_bb_c4_1024_0
scff8  80   96   aa_bb_c4_24_0   scff8   14  42   aa_bb_c4_24_0
scff1  20   25   aa_bb_c4_98_0   scff4   11  25   aa_bb_c4_13_0
scff6  16   61   aa_bb_c4_84_0   scff6   15  16   aa_bb_c4_84_0

I tried using following script sed -i.bak 's/_[0-9]*$//' sample.txt but it did remove the characters after the last underscore in 8th column but not in the 4th column. Can some one can guide me in achieving my desired output. Thanks in advance.

Upvotes: 0

Answers (5)

Claes Wikner

Reputation: 1517

awk '{gsub(/_0_./,"_0")}1' file

scff2  54   92   aa_bb_c4_1024_0 scff2   30  18   aa_bb_c4_1024_0
scff8  80   96   aa_bb_c4_24_0   scff8   14  42   aa_bb_c4_24_0
scff1  20   25   aa_bb_c4_98_0   scff4   11  25   aa_bb_c4_13_0
scff6  16   61   aa_bb_c4_84_0   scff6   15  16   aa_bb_c4_84_0

Upvotes: 0

Ed Morton

Reputation: 203502

It looks like all you need is:

$ sed 's/_[0-9]\( \|$\)/\1/g' file
scff2  54   92   aa_bb_c4_1024_0 scff2   30  18   aa_bb_c4_1024_0
scff8  80   96   aa_bb_c4_24_0   scff8   14  42   aa_bb_c4_24_0
scff1  20   25   aa_bb_c4_98_0   scff4   11  25   aa_bb_c4_13_0
scff6  16   61   aa_bb_c4_84_0   scff6   15  16   aa_bb_c4_84_0

or if your sed supports -E to enable EREs (which I expect yours does since you're using -i):

$ sed -E 's/_[0-9]( |$)/\1/g' file
scff2  54   92   aa_bb_c4_1024_0 scff2   30  18   aa_bb_c4_1024_0
scff8  80   96   aa_bb_c4_24_0   scff8   14  42   aa_bb_c4_24_0
scff1  20   25   aa_bb_c4_98_0   scff4   11  25   aa_bb_c4_13_0
scff6  16   61   aa_bb_c4_84_0   scff6   15  16   aa_bb_c4_84_0

or as @GlennJackman pointed out in the comments, with GNU sed (the above would work with other seds too, e.g. OSX sed), it'd be:

sed 's/_[0-9]\>//g'

Upvotes: 3

James Brown

Reputation: 37404

In GNU awk, everything ending in `_[0-9]+' removed:

$ awk '{gsub(/_[0-9]+\>/,"")}1' file
scff2  54   92   aa_bb_c4_1024_0 scff2   30  18   aa_bb_c4_1024_0
scff8  80   96   aa_bb_c4_24_0   scff8   14  42   aa_bb_c4_24_0
...

Upvotes: 0

Gilles Quénot

Reputation: 185106

Sometimes it's useful to store the result of a substitution in gawk :

$ awk '{$4=gensub(/_[0-9]$/, "", 1, $4); $8=gensub(/_[0-9]$/, "", 1, $8)}1' file

Output :

scff2 54 92 aa_bb_c4_1024_0 scff2 30 18 aa_bb_c4_1024_0
scff8 80 96 aa_bb_c4_24_0 scff8 14 42 aa_bb_c4_24_0
scff1 20 25 aa_bb_c4_98_0 scff4 11 25 aa_bb_c4_13_0
scff6 16 61 aa_bb_c4_84_0 scff6 15 16 aa_bb_c4_84_0

But @Barmar solution is smarter/shorter/lighter

Not in all awk implementations : not nawk, need GNU awk or maybe mawks

Upvotes: 2

Barmar

Reputation: 780984

You can use sub() in awk to perform a substitution in a specific field.

awk '{sub(/_[0-9]*$/, "", $4); sub(/_[0-9]*$/, "", $8); print}' sample.txt

Upvotes: 3

Deleting the last characters in the specific columns

Answers (5)

Output :

Related Questions