Buksy
Buksy

Reputation: 12228

Get unique lines

I'm creating graph in graphViz and I need every connection to be display only once, how to transform this input using linux commands?

INPUT

aa -- bb[label=xyz]
ab -- bb[label=yzx]
aa -- bb[label=zxy]
ac -- ab[label=xyz]
bb -- aa[label=xzy]

DESIRED OUTPUT:

aa -- bb[label=xyz]
ab -- bb[label=yzx]
ac -- ab[label=xyz]

so aa -- bb equals to bb -- aa and needs to be removed.

I tried sort -k1,2 -u -t[ bot it didnt work with [ delimiter and don't know how to check for "reverse" entries ("xx -- yy" = "yy -- xx")

Upvotes: 2

Views: 107

Answers (3)

twalberg
twalberg

Reputation: 62389

Here's one idea (not tested, but should be close):

sed -e 's/[[].*// -e 's/-- //' input.txt |
  awk '{ if ((e[$1$2] != 1) && (e[$2$1] != 1))
         { print $1, $2
           e[$1$2] = e[$2$1] = 1
         }
       }'

The sed ... bit strips out the -- and the [label...] portions, since you don't seem to care about them, then awk keeps track of which pairs have been seen in either order and only prints them if they haven't been seen yet.

Upvotes: 0

Chris Seymour
Chris Seymour

Reputation: 85795

Here is a method using awk:

$ awk -F'[[]| -- ' '!a[$1,$2]++&&!a[$2,$1]' file
aa -- bb[label=xyz]
ab -- bb[label=yzx]
ac -- ab[label=xyz]

Upvotes: 4

SteveP
SteveP

Reputation: 19093

You can specifify [ as the delimiter this way:

sort -k2 -u -t'['

Does that give you what you need ?

Upvotes: 0

Related Questions