Remove part of the data in a column in awk

Question

I have columns in the files as

  1 11469   12272   ABCD:E1.121 +

And I want to have the output as

  1 11469   12272   ABCD:E1 +

I tried

  awk '{ sub(/./,"",$4); print }' file

and I am getting something like

 1  11469   12272   BCD:E1.121 +

instead of what I wanted, which is

 1  11469   12272   ABCD:E1 +

Jonathan Leffler · Accepted Answer

Note that . is a metacharacter in the regex; it matches any character (except newline). That's why the A vanished. You need something like /\.[0-9]+/ as the regex, to eliminate decimal point and digits that follow.

$ cat data
  1 11469   12272   ABCD:E1.121 +
$ awk '{ sub(/./,"",$4); print }' data             # Original script; wrong output
1 11469 12272 BCD:E1.121 +
$ awk '{ sub(/\.[0-9]+/, "", $4); print }' data    # Modified script; right output
1 11469 12272 ABCD:E1 +
$

Note that I've assumed you want to strip a 'fraction' — a decimal point and some digits from the end of field 4. On the basis of the one line of sample data, that works. If necessary, you can refine the regex to match other patterns in the data and modify them appropriately. You could add a $ after the plus to indicate 'decimal point and digits to end of field' so ABCD:E1.234X would not become ABCD:E1X, for example.

Remove part of the data in a column in awk

Answers (2)

Related Questions