H-H
H-H

Reputation: 456

Using pipe character as a field separator

I'm trying different commands to process csv file where the separator is the pipe | character.

While those commands do work when the comma is a separator, it throws an error when I replace it with the pipe:

awk -F[|] "NR==FNR{a[$2]=$0;next}$2 in a{ print a[$2] [|] $4 [|] $5 }" OFS=[|] file1.csv file2.csv

awk "{print NR "|" $0}" file1.csv

I tried, "|", [|], /| to no avail.

I'm using Gawk on windows. What I'm I missing?

Upvotes: 1

Views: 10625

Answers (4)

dan
dan

Reputation: 5231

For anyone finding this years later: ALWAYS QUOTE SHELL METACHARACTERS!

I think gawk (GNU awk) treats | specially, so it should be quoted (for awk). OP had this right with [|]. However [|] is also a shell pattern. Which in bash at least, will only expand if it matches a file in the current working directory:

$ cd /tmp
$ echo -F[|]    # Same command
-F[|]
$ touch -- '-F|'
$ echo -F[|]    # Different output
-F|
$ echo '-F[|]'  # Good quoting
-F[|]           # Consistent output

So it should be:

awk '-F[|]'
# or
awk -F '[|]'

awk -F "[|]" would also work, but IMO, only use soft quotes (") when you have something to actually expand (or the string itself contains hard quotes ('), which can't be nested in any way).

Note that the same thing happens if these characters are inside unquoted variables.

If text or a variable contains, or may contain: []?*, quote it, or set -f to turn off pathname expansion (a single, unmatched square bracket is technically OK, I think).

If a variable contains, or may contain an IFS character (space, tab, new line, by default), quote it (unless you want it to be split). Or export IFS= first (bearing the consequences), if quoting is impossible (eg. a crazy eval).

Note: raw text is always split by white space, regardless of IFS.

Upvotes: 3

fedorqui
fedorqui

Reputation: 289725

You tried "|", [|] and /|. /| does not work because the escape character is \, whereas [] is used to define a range of fields, for example [,-] if you want FS to be either , or -.

To make it work "|" is fine, are you sure you used it this way? Alternativelly, escape it --> \|:

$ echo "he|llo|how are|you" | awk -F"|" '{print $1}'
he
$ echo "he|llo|how are|you" | awk -F\| '{print $1}'
he
$ echo "he|llo|how are|you" | awk 'BEGIN{FS="|"} {print $1}'
he

But then note that when you say:

print a[$2] [|] $4 [|] $5

so you are not using any delimiter at all. As you already defined OFS, do:

print a[$2], $4, $5

Example:

$ cat a
he|llo|how are|you
$ awk 'BEGIN {FS=OFS="|"} {print $1, $3}' a
he|how are

Upvotes: 5

nu11p01n73R
nu11p01n73R

Reputation: 26667

You can escape the | as \|

$ cat test
hello|world
$ awk -F\| '{print $1, $2}' test
hello world

Upvotes: 1

Jotne
Jotne

Reputation: 41456

Try to escape the |

echo "more|data"  | awk -F\| '{print $1}'
more

Upvotes: 1

Related Questions