Reputation: 1305
I'm having problems with AWK's field delimiter, the input file appears as below
1 | all | | synonym |
1 | root | | scientific name |
2 | Bacteria | Bacteria | scientific name |
2 | Monera | Monera | in-part |
2 | Procaryotae | Procaryotae | in-part |
2 | Prokaryota | Prokaryota | in-part |
2 | Prokaryotae | Prokaryotae | in-part |
2 | bacteria | bacteria | blast name |
the field delimiter here is tab,pipe,tab \t|\t
so in my attempt to print just the 1st and 2nd column
awk -F'\t|\t' '{print $1 "\t" $2}' nodes.dmp | less
instead of the desired output, the output is the 1st column followed by the pipe character. I tried escaping the pipe \t\|\t
, but the output remains the same.
1 |
1 |
2 |
2 |
2 |
2 |
Printing the 1st and 3rd column gave me the original intended output.
awk -F'\t|\t' '{print $1 "\t" $3}' nodes.dmp | less
but i'm puzzed as to why this is not working as intended.
I understand that the perl one liner below will work but what i really want is to use awk.
perl -aln -F"\t\|\t" -e 'print $F[0],"\t",$F[1]' nodes.dmp | less
Upvotes: 4
Views: 6604
Reputation: 6240
Using cut
command:
cut -f1,2 -d'|' file.txt
without pipe
in output:
cut -f1,2 -d'|' file.txt | tr -d '|'
Upvotes: 0
Reputation: 203493
From your posted input:
|
, not |\t
, and|\t|
, andSo, an FS of tab-pipe-tab is wrong since it won't match any of the above cases since the first is just tab-pipe and the tab in the middle of the second will match the tab-pipe-tab from the preceding field but then that just leaves pipe-tab for the following field, and the first leaves you with an undesirable leading tab.
What you actually need is to set the FS to just tab-pipe and then strip off the leading tab from each field:
awk -F'\t|' -v OFS='\t' '{gsub(/(^|[|])\t/,""); print $1, $2}' file
That way you can handle all fields from 1 to NF-1 exactly the same as each other.
Upvotes: 1
Reputation: 123478
The pipe |
character seems to be confusing awk
into thinking that \t|\t
implies that the field separator could be one of \t
or \t
. Tell awk to interpret the |
literally.
$ awk -F'\t[|]\t' '{print $1 "\t" $2}'
1 all
1 root
2 Bacteria
2 Monera
2 Procaryotae
2 Prokaryota
2 Prokaryotae
2 bacteria
Upvotes: 6