Karthik
Karthik

Reputation: 2581

awk: split a column of delimited text in a row into lines

I have a file with five columns and the second column has delimited text. I want to split that delimited text dedup it and print into lines. I'm able to do it with the commands below. I want to make a awk script. Can anyone help me.

awk -F"\t" 'NR>1{print $2}' <input file> | awk -F\| '{for (i = 0; ++i <= NF;) print $i}' | awk '!x[$0]++'

Input file:

test    hello|good|this|will|be    23421    test    4543
test2    good|would|may|can    43234    test2    3421

Output:

hello
good
this
will
be
would
may
can

Upvotes: 3

Views: 5405

Answers (3)

Jotne
Jotne

Reputation: 41456

Here is how I would have done it:

awk '{n=split($2,a,"|");for (i=1;i<=n;i++) print a[i]}' file
hello
good
this
will
be
good
would
may
can

Or this way (this may change the order of the outdata, but for some reason I am not sure about, it works fine here):

awk '{split($2,a,"|");for(i in a) print a[i]}' file
hello
good
this
will
be
good
would
may
can

Or if you do not like duplicate output:

awk '{split($2,a,"|");for(i in a) if (!f[a[i]]++) print a[i]}' file
hello
good
this
will
be
would
may
can

Upvotes: 0

Tom Fenech
Tom Fenech

Reputation: 74615

You could use this single awk one-liner:

$ awk '{split($2,a,"|");for(i in a)if(!seen[a[i]]++)print a[i]}' file
will
be
hello
good
this
can
would
may

The second field is split into the array a on the | character. Each element of a is printed if it isn't already in seen, which will only be true on the first occurrence.

Note that the order of the keys is undefined.


To preserve the order, you can use this:

$ awk '{n=split($2,a,"|");for(i=1;i<=n;++i)if(!seen[a[i]]++)print a[i]}' file

split returns the number of elements in the array a, which you can use to loop through them in the order they appeared.

Upvotes: 2

glenn jackman
glenn jackman

Reputation: 246807

I wrote exactly Tom's answer before I saw it. If you want to maintain the order of the words as they are seen, it's a little more work:

awk '
    {
        n = split($2, a, "|")
        for (i=1; i<=n; i++) 
            if (!(a[i] in seen)) {
                # the hash to store the unique keys
                seen[a[i]] = 1
                # the array to store the keys in order
                words[++count] = a[i]
            }
    }
    END {for (i=1; i<=count; i++) print words[i]}
' file
hello
good
this
will
be
would
may
can

Upvotes: 0

Related Questions