AishwaryaKulkarni
AishwaryaKulkarni

Reputation: 784

Remove part of the data in a column in awk

I have columns in the files as

  1 11469   12272   ABCD:E1.121 +

And I want to have the output as

  1 11469   12272   ABCD:E1 +

I tried

  awk '{ sub(/./,"",$4); print }' file 

and I am getting something like

 1  11469   12272   BCD:E1.121 +

instead of what I wanted, which is

 1  11469   12272   ABCD:E1 +

Upvotes: 0

Views: 734

Answers (2)

Claes Wikner
Claes Wikner

Reputation: 1517

awk '{sub(/E1.121 \+/,"E1 +")}1' file
1 11469   12272   ABCD:E1 +

Upvotes: 0

Jonathan Leffler
Jonathan Leffler

Reputation: 753970

Note that . is a metacharacter in the regex; it matches any character (except newline). That's why the A vanished. You need something like /\.[0-9]+/ as the regex, to eliminate decimal point and digits that follow.

$ cat data
  1 11469   12272   ABCD:E1.121 +
$ awk '{ sub(/./,"",$4); print }' data             # Original script; wrong output
1 11469 12272 BCD:E1.121 +
$ awk '{ sub(/\.[0-9]+/, "", $4); print }' data    # Modified script; right output
1 11469 12272 ABCD:E1 +
$

Note that I've assumed you want to strip a 'fraction' — a decimal point and some digits from the end of field 4. On the basis of the one line of sample data, that works. If necessary, you can refine the regex to match other patterns in the data and modify them appropriately. You could add a $ after the plus to indicate 'decimal point and digits to end of field' so ABCD:E1.234X would not become ABCD:E1X, for example.

Upvotes: 2

Related Questions