Reputation: 259

column unique values linux

I'm trying to figure out how to keep only unique values based on all the values before the last column but keep the last column data also. I would also like to sort the data. For example:

Input
Africa is huge 20
India is blue glue 10
Africa is huge 10
Italy is in europe 3
America 2014 15
Italy makes pizza 3

Output:
Africa is huge 30
America 2014 15
India is blue glue 10
Italy makes pizza 3
Italy is in europe 3

I know you can use sort -n and uniq but im not sure of other functions I can use. Thanks!

Upvotes: 0

Answers (2)

jai_s

Reputation: 101

if you sort by the text before the last no, you should get --

sed 's/\( [0-9]*$\)/,\1/' 1 |sort -t"," -k1,1 -u  |sed 's/,//'
Africa is huge 20
America 2014 15
India is blue glue 10
Italy is in europe 3
Italy makes pizza 3

Upvotes: 0

hek2mgl

Reputation: 158260

uniq will not work here, as it isn't able to build the sum of the second columns. But you can use awk for that:

awk '{a[$1]+=$2}END{for(i in a) print i,a[i]}' input.file

You have changed the input data a bit, now the awk script needs to get generalized. While the above script will group the data by the value of the first column and sum the second column, the below script will group the data by the value of the first column until the one before last column and calculates the sum of the last column:

awk 'match($0,/.* /){a[substr($0,RSTART,RLENGTH)]+=$NF}END{for(i in a)print i,a[i]}' file

Upvotes: 2

column unique values linux

Answers (2)

Related Questions