Caleb Xu
Caleb Xu

Reputation: 330

Concatenate and manipulate tree-like structure in bash

Currently, I have a command that outputs data in the following format:

apple: banana
apple: cantaloupe
apple: durian
apple: eggplant
banana: cantaloupe
banana: durian
durian: eggplant
eggplant:

In other words, it's a tree-like structure in which apple is the root, which has children banana and eggplant, and banana also has sub-children cantaloupe and durian. eggplant has no children, yet still has a trailing colon.

I want to concatenate the output into this format:

apple: banana eggplant
banana: cantaloupe durian
durian: eggplant
eggplant:

Some objects may show up more than once in the output (in this case, cantaloupe, durian, and eggplant have multiple parent nodes). While this example doesn't have it, there may also be multiple root nodes (i.e. same breadth as apple).

How would I go about modifying this output? I'm using bash/shell scripting in general right now, so I was thinking awk would probably be the best way to handle this, but if this is better handled in Python, Ruby, Perl, or some other scripting language, I'm also open to suggestions.

Upvotes: 2

Views: 414

Answers (2)

Jonathan Leffler
Jonathan Leffler

Reputation: 753990

awk -F: '{ list[$1] = list[$1] $2 } END { for (i in list) printf "%s:%s\n", i, list[i] }'

Accumulate entries using the associative arrays in awk, building up the list. String concatenation in awk is a bit weird. At the end, print out the keys and the entries for the key. If there's ordering required, you need to say so.

Assuming that the keys on the left should be output in the order of first appearance on the LHS of the input, then you can use this slightly more complex script:

awk -F: '{ if (!($1 in list)) keys[++n] = $1; list[$1] = list[$1] $2 }
         END { for (j = 1; j <= n; j++) printf "%s:%s\n", keys[j], list[keys[j]] }'

Upvotes: 2

anubhava
anubhava

Reputation: 785246

You can use awk:

awk -F ': *' '{a[$1] = (a[$1]? a[$1] OFS $2 : $2)}
       END { for (i in a) print i ": " a[i] }' file
eggplant:
apple: banana cantaloupe durian eggplant
banana: cantaloupe durian
durian: eggplant

To maintain the original order:

awk -F ': *' '!($1 in a){b[++n]=$1} {a[$1] = (a[$1]? a[$1] OFS $2 : $2)}
   END{for (i=1; i<=n; i++) print b[i] ": " a[b[i]]}' file
apple: banana cantaloupe durian eggplant
banana: cantaloupe durian
durian: eggplant
eggplant:

Upvotes: 2

Related Questions