Finding common elements for multiple lines in a text file

Question

Assuming File.txt like below:

A1 B C D 
E F C H
C J 
A2 F B
D J C 
F T Y U I
B C N J Y

What I need is to check for the lines starting with pattern "^A" and then look for the elements after that (from $2 to the end of line). Then I need to find the common elements in lines starting with these elements. Here is the output for File.txt:

A1 C J
A2 Y

or

A1 J C
A2 Y

The order of common elements (e.g. J and C) in output does not matter.

P.S. Awk is preferred.

Ed Morton · Accepted Answer

Using GNU awk for true 2D arrays and delete array and length(array):

$ cat tst.awk
{ for (i=1;i<=NF;i++) children[$1][$i] }
/^A/{ parents[$1]; delete children[$1][$1] }
END {
    for (parent in parents) {
        delete count
        printf "%s", parent
        for (child in children[parent])
            for (grandchild in children[child])
                if (++count[grandchild] == length(children[parent]))
                    printf " %s", grandchild
        print ""
    }
}

$ awk -f tst.awk file
A1 C J
A2 Y

It works by just checking that the count for the number of occurrences of any field in the non-A lines matches the count of the 2nd+ fields in the A lines since that says it occurs in every case.

Finding common elements for multiple lines in a text file

Answers (2)

Related Questions