codeforester
codeforester

Reputation: 42999

Using multiple delimiters when one of them is a pipe character

I have a text file where fields are separated by a pipe character. Since it is a human readable text, there are spaces used for column alignment.

Here is a sample input:

+------------------------------------------+----------------+------------------+
|  Column1  |   Column2    |   Column3     |    Column4     |   Last Column    |
+------------------------------------------+----------------+------------------+
| some_text |  other_text  |  third_text   |   fourth_text  |  last_text       |
<more such lines>
+------------------------------------------+----------------+------------------+

How can I use awk to extract the third field in this case? The I tried:

awk -F '[ |]' '{print $3}' file
awk -F '[\|| ]' '{print $3}' file
awk -F '[\| ]' '{print $3}' file

The expected result is:

<blank>
Column3
<more column 3 values>
<blank>
third_text

I am trying to achieve this with a single awk command. Isn't that possible?

The following post talks about using pipe as a delimiter in awk but it doesn't talk about the case of multiple delimiters where one of them is a pipe character:

Upvotes: 0

Views: 311

Answers (3)

Corentin Limier
Corentin Limier

Reputation: 5006

Am I missing something ?

Example input :

+------------------------------------------+----------------+------------------+
|  Column1  |   Column2    |   Column3     |    Column4     |   Last Column    |
+------------------------------------------+----------------+------------------+
| some_text |  other_text  |  third_text   |   fourth_text  |  last_text       |
| some_text2|  other_text2 |  third_text2  |   fourth_text2 |  last_text2      |
+------------------------------------------+----------------+------------------+ 

Command :

gawk -F '[| ]*' '{print $4}' <file>

Output :

<blank>
Column3
<blank>
third_text
third_text2
<blank>

Works for every column (you just need to use i+1 instead of i because first column empty values or +-----).

Upvotes: 4

Gilles Qu&#233;not
Gilles Qu&#233;not

Reputation: 185171

is better suited for this use case :

$ perl -F'\s*\|\s*' -lane 'print $F[3]' File
#      ____________
#           ^
#           |
#  FULL regex support with -F switch (delimiter, like awk, but more powerful)

Upvotes: 1

KamilCuk
KamilCuk

Reputation: 141060

First preparse with sed - remove first, third and last line, replace all spaces+|+spaces with a single |, remove leading | - then just split with awk using | (could be really cut -d'|' -f3).

sed '1d;3d;$d;s/ *| */|/g;s/^|//;' |
awk -F'|' '{print $3}'

Upvotes: 0

Related Questions