Reputation: 31

awk to remove 5th column from N column with fixed delimiter

I have file with Nth columns
I want to remove the 5th column from last of Nth columns
Delimiter is "|"

I tested with simple example as shown below:

bash-3.2$ echo "1|2|3|4|5|6|7|8" | nawk -F\| '{print $(NF-4)}'
4

Expecting result:

1|2|3|5|6|7|8

How should I change my command to get the desired output?

Upvotes: 0

Answers (4)

Fehan

Reputation: 31

Thanks for the help and guidance.

Below is what I tested:

bash-3.2$ echo "1|2|3|4|5|6|7|8|9" | nawk 'BEGIN{FS="|";OFS="|"} {$(NF-4)="!";print}' | sed 's/|!//'

Output: 1|2|3|4|6|7|8|9

Further tested on the file that I have extracted from system and so it worked fine.

Upvotes: 0

Tom Fenech

Reputation: 74705

If I understand you correctly, you want to use something like this:

sed -E 's/\|[^|]*((\|[^|]*){4})$/\1/'

This matches a pipe character \| followed by any number of non-pipe characters [^|]*, then captures 4 more of the same pattern ((\|[^|]*){4}). The $ at the end matches the end of the line. The first part of the match (i.e. the fifth field from the end) is dropped.

Testing it out:

$ sed -E 's/\|[^|]*((\|[^|]*){4})$/\1/' <<<"1|2|3|4|5|6|7"
1|2|4|5|6|7

You could achieve the same thing using GNU awk with gensub but I think that sed is the right tool for the job in this case.

If your version of sed doesn't support extended regex syntax with -E, you can modify it slightly:

sed 's/|[^|]*\(\(|[^|]*\)\{4\}\)$/\1/'

In basic mode, pipes are interpreted literally but parentheses for capture groups and curly brcneed to be escaped.

Upvotes: 2

karakfa

Reputation: 67567

another alternative, using @sjsam's input file

$ rev file | cut -d'|' --complement -f6 | rev 

A|B|C|E|F|G|H|I
A|B|C|D|F|G|H|I|A
A|B|C|D|E|F|G|H|I|F|E|O|R|Q|U|I
A|B|C|D|E|F|H|I|E|O|Q
A|B|C|D|F|G|H|I|X
A|B|C|D|E|F|H|I|J|K|L

not sure you want the 5'th from the last or 6th. But it's easy to adjust.

Upvotes: 1

sjsam

Reputation: 21975

AWK is your friend :

Sample Input

A|B|C|D|E|F|G|H|I
A|B|C|D|E|F|G|H|I|A
A|B|C|D|E|F|G|H|I|F|E|D|O|R|Q|U|I
A|B|C|D|E|F|G|H|I|E|O|Q
A|B|C|D|E|F|G|H|I|X
A|B|C|D|E|F|G|H|I|J|K|L

Script

awk 'BEGIN{FS="|";OFS="|"}
      {$(NF-5)="";sub(/\|\|/,"|");print}' file

Sample Output

A|B|C|E|F|G|H|I
A|B|C|D|F|G|H|I|A
A|B|C|D|E|F|G|H|I|F|E|O|R|Q|U|I
A|B|C|D|E|F|H|I|E|O|Q
A|B|C|D|F|G|H|I|X
A|B|C|D|E|F|H|I|J|K|L

What we did here

As you are aware awk's has special variables to store each field in the record, which ranges from $1,$2 upto $(NF)
To exclude the 5th from the last column is as simple as
- Emptying the colume ie $(NF-5)=""
- Removing from the record, the consecutive | formed by the above step ie do
  sub(/\|\|/,"|")

Upvotes: 1

awk to remove 5th column from N column with fixed delimiter

Answers (4)

Related Questions