user3684042
user3684042

Reputation: 671

Finding common elements for multiple lines in a text file

Assuming File.txt like below:

A1 B C D 
E F C H
C J 
A2 F B
D J C 
F T Y U I
B C N J Y

What I need is to check for the lines starting with pattern "^A" and then look for the elements after that (from $2 to the end of line). Then I need to find the common elements in lines starting with these elements. Here is the output for File.txt:

A1 C J
A2 Y

or

A1 J C
A2 Y

The order of common elements (e.g. J and C) in output does not matter.

P.S. Awk is preferred.

Upvotes: 1

Views: 162

Answers (2)

Ed Morton
Ed Morton

Reputation: 203209

Using GNU awk for true 2D arrays and delete array and length(array):

$ cat tst.awk
{ for (i=1;i<=NF;i++) children[$1][$i] }
/^A/{ parents[$1]; delete children[$1][$1] }
END {
    for (parent in parents) {
        delete count
        printf "%s", parent
        for (child in children[parent])
            for (grandchild in children[child])
                if (++count[grandchild] == length(children[parent]))
                    printf " %s", grandchild
        print ""
    }
}

$ awk -f tst.awk file
A1 C J
A2 Y

It works by just checking that the count for the number of occurrences of any field in the non-A lines matches the count of the 2nd+ fields in the A lines since that says it occurs in every case.

Upvotes: 3

Etan Reisner
Etan Reisner

Reputation: 80921

This is a bit ugly and I feel like it should be doable in a cleaner way but it works on the sample data at least.

/^A/ {
    amap[$1]=NF - 1
    for (i=2; i<=NF; i++) {
        rmap[$i]=rmap[$i] (rmap[$i]?SUBSEP:"") $1
    }
    next
}

$1 in rmap {
    split(rmap[$1], a, SUBSEP)
    for (f in a) {
        for (i=1; i<=NF; i++) {
            afmap[a[f],$i]++
        }
    }
}

END {
    for (af in afmap) {
        split(af, a, SUBSEP)
        if (afmap[af] == amap[a[1]]) {
            o[a[1]]=o[a[1]] (o[a[1]]?" ":"") a[2]
        }
    }
    for (f in o) {
        print f, o[f]
    }
}

Upvotes: 1

Related Questions