Reputation: 671
Assuming File.txt like below:
A1 B C D
E F C H
C J
A2 F B
D J C
F T Y U I
B C N J Y
What I need is to check for the lines starting with pattern "^A" and then look for the elements after that (from $2 to the end of line). Then I need to find the common elements in lines starting with these elements. Here is the output for File.txt:
A1 C J
A2 Y
or
A1 J C
A2 Y
The order of common elements (e.g. J and C) in output does not matter.
P.S. Awk is preferred.
Upvotes: 1
Views: 162
Reputation: 203209
Using GNU awk for true 2D arrays and delete array and length(array):
$ cat tst.awk
{ for (i=1;i<=NF;i++) children[$1][$i] }
/^A/{ parents[$1]; delete children[$1][$1] }
END {
for (parent in parents) {
delete count
printf "%s", parent
for (child in children[parent])
for (grandchild in children[child])
if (++count[grandchild] == length(children[parent]))
printf " %s", grandchild
print ""
}
}
$ awk -f tst.awk file
A1 C J
A2 Y
It works by just checking that the count for the number of occurrences of any field in the non-A lines matches the count of the 2nd+ fields in the A lines since that says it occurs in every case.
Upvotes: 3
Reputation: 80921
This is a bit ugly and I feel like it should be doable in a cleaner way but it works on the sample data at least.
/^A/ {
amap[$1]=NF - 1
for (i=2; i<=NF; i++) {
rmap[$i]=rmap[$i] (rmap[$i]?SUBSEP:"") $1
}
next
}
$1 in rmap {
split(rmap[$1], a, SUBSEP)
for (f in a) {
for (i=1; i<=NF; i++) {
afmap[a[f],$i]++
}
}
}
END {
for (af in afmap) {
split(af, a, SUBSEP)
if (afmap[af] == amap[a[1]]) {
o[a[1]]=o[a[1]] (o[a[1]]?" ":"") a[2]
}
}
for (f in o) {
print f, o[f]
}
}
Upvotes: 1