user3684042
user3684042

Reputation: 671

Finding common elements among all lines of a text file

I have a text file like:

a b c d e
b c e
d f g e h c

I am looking for a simple AWK which can output the common elements among all lines ignoring their first element. The desired output is:

c e

or

e c

Upvotes: 1

Views: 92

Answers (3)

glenn jackman
glenn jackman

Reputation: 246807

Another perl approach:

perl -lane '
    if ($. == 1) { %intersect = map {$_ => 1} @F; next } 
    %intersect =  map {$_ => 1} grep {$intersect{$_}} @F; 
    END {print join " ", keys %intersect}
' file

Results will not be in any particular order.

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 203512

$ cat tst.awk
FNR==1 { for (i=1; i<=NF; i++) common[$i]; next }
{
    for (c in common) {
        present = 0
        for (i=1; i<=NF; i++) {
            if ($i == c) {
                present = 1
            }
        }
        if (!present) {
            delete common[c]
        }
    }
}
END {
    i=0
    for (c in common) {
        printf "%s%s", (++i>1?OFS:""), c
    }
    print ""
}
$ awk -f tst.awk file
c e

If you really want to skip the first char on each line, just change the 2 for (i=1; i<=NF; i++) loops to start at 2 instead of 1.

Although the above was accepted I actually prefer @jaypal's approach (but not his choice of tool :-) ), so here's the awk equivalent:

$ cat tst.awk
{ delete seen; for (i=1; i<=NF; i++) if (!seen[$i]++) count[$i]++ }
END {
    i=0
    for (c in count)
        if (count[c] == NR)
            printf "%s%s", (++i>1?OFS:""), c
    print ""
}
$
$ awk -f tst.awk file
c e

If your awk doesn't support delete seen, change it to split("",seen).

Upvotes: 3

jaypal singh
jaypal singh

Reputation: 77105

perl to the rescue:

perl -lane '
    my %seen;
    map { $total{$F[$_]}++ unless $seen{$F[$_]}++ } 1 .. $#F; 
}{ 
    print join " ", grep { $total{$_} == $. } keys %total
' file
e c

Keep a rolling %total hash which will increment the elements only if they are unique for every line. %seen is a hash that helps keep track of those elements. Hence we use my declaration to reset it for every line.

In the END block we just grep those elements whose value meets the total number of lines as that would mean they were seen on every line.

The command line options are:

  • -l: Chomps the newline and places it back during print.
  • -a: Splits the line on whitespace and loads an array @F with those values.
  • -n: Creates a while(<>) { .. } loop to process every line.
  • -e: Executes the code block thats follows in quotes.

Upvotes: 3

Related Questions