Reputation: 3
I have a file containing 3 columns and thousand of rows. Below is an example.
File.txt
Column1 column2 column3
173 banana red
896 banana red
567 apple green
742 apple green
893 apple green
567 avocado black
345 avocado black
I need to print all rows from column1, but only a unique name from column2 and column3.
I want this output:
Column1 column2 column3
173 banana red
896
567 apple green
742
893
567 avocado black
345
Better if I can get in the format below:
Banana-red: 173 896
Apple-green: 567 742 893
Avocado-black: 567 345
Upvotes: 0
Views: 80
Reputation: 203358
$ awk 'NR>1{k=$2"-"$3; a[k]=a[k]" "$1} END{for (k in a) print k ":" a[k]}' file
apple-green: 567 742 893
banana-red: 173 896
avocado-black: 567 345
The rows will be output in random order courtesy of the in
operator, the columns will be in the order they occur in your input for each key value. If you really want the first letter of each key capitalized as in the expected output in your question:
$ awk 'NR>1{k=$2"-"$3; a[k]=a[k]" "$1} END{for (k in a) print toupper(substr(k,1,1)) substr(k,2) ":" a[k]}' file
Apple-green: 567 742 893
Banana-red: 173 896
Avocado-black: 567 345
and if you want the rows output in the order they occurred in the input:
$ awk 'NR>1{k=$2"-"$3; a[k]=a[k]" "$1l; if (!seen[k]++) keys[++numKeys]=k} END{for (keyNr=1; keyNr<=numKeys; keyNr++) {k=keys[keyNr]; print toupper(substr(k,1,1)) substr(k,2) ":" a[k]} }' file
Banana-red: 173 896
Apple-green: 567 742 893
Avocado-black: 567 345
Upvotes: 1