bgame2498
bgame2498

Reputation: 4857

find the largest and smallest value in a column that corresponds with another column

I have a 5 column, 10,000+ row CSV data file. I need to find the largest and smallest value from column 2, for each different value in column 1, and then write those to a new file. I am relatively new to grep, awk, head, tail, and the others alike. Here is a few lines of the file in what we can call temp.csv

1553 1806.345000 20130516-044310 33.800000 -97.110000 

1555 2106.947000 20130516-044310 33.470000 -94.620000

1559 2106.947000 20130516-044310 31.460000 -97.260000

1573 1807.591000 20130516-045311 41.150000 -94.020000

1573 2107.911000 20130516-045311 41.120000 -94.020000

1573 2408.994000 20130516-045311 41.120000 -94.050000

1573 2709.545000 20130516-045311 41.090000 -94.020000

1573 3010.308000 20130516-045311 41.090000 -94.020000

1573 3310.988000 20130516-045311 41.090000 -93.990000

1573 3611.129000 20130516-045311 41.120000 -93.960000

1573 3912.392000 20130516-045311 41.090000 -93.960000

1585 1806.756000 20130516-045812 31.040000 -98.880000

1585 2107.839000 20130516-045812 31.040000 -98.850000

1585 2408.390000 20130516-045812 31.010000 -98.820000

1585 2709.153000 20130516-045812 31.010000 -98.790000

1611 1804.813000 20130516-051316 31.280000 -97.800000

So for example from these data, I would want the output to look like:

1553 1806.345000 20130516-044310 33.800000 -97.110000 

1555 2106.947000 20130516-044310 33.470000 -94.620000

1559 2106.947000 20130516-044310 31.460000 -97.260000

1573 1807.591000 20130516-045311 41.150000 -94.020000

1573 3912.392000 20130516-045311 41.090000 -93.960000

1585 1806.756000 20130516-045812 31.040000 -98.880000

1585 2709.153000 20130516-045812 31.010000 -98.790000

1611 1804.813000 20130516-051316 31.280000 -97.800000

Some numbers in the first row will only have one entry, in which it would obviously be the largest and smallest. Any help would be appreciated.

Upvotes: 1

Views: 155

Answers (2)

jaypal singh
jaypal singh

Reputation: 77185

Here is one way of accomplishing the task. It doesn't care if the data is sorted or not:

awk '  
$1 in keys {
    map["min",$1] = (keys[$1] < $2 ? map["min",$1] : $0); 
    map["max",$1] = (keys[$1] > $2 ? map["max",$1] : $0);
}
NF {
    keys[$1] = $2;
}
!seen[$1]++ {
    map["min",$1] = $0;
    map["max",$1] = $0;
}
END {
    for (key in keys) {
        if (map["min",key] == map["max",key]) {
            print map["min",key]
        }
        else { 
            print map["min",key]
            print map["max",key]
        }
    }
}' file
1611 1804.813000 20130516-051316 31.280000 -97.800000
1585 1806.756000 20130516-045812 31.040000 -98.880000
1585 2709.153000 20130516-045812 31.010000 -98.790000
1553 1806.345000 20130516-044310 33.800000 -97.110000 
1555 2106.947000 20130516-044310 33.470000 -94.620000
1559 2106.947000 20130516-044310 31.460000 -97.260000
1573 1807.591000 20130516-045311 41.150000 -94.020000
1573 3912.392000 20130516-045311 41.090000 -93.960000

Upvotes: 3

konsolebox
konsolebox

Reputation: 75628

awk '!NF { next }
     !a[$1]++ { if (length(p)) print p; print; p = ""; next }
     { p = $0 } END { if (length(p)) print p }' file

Output:

1553 1806.345000 20130516-044310 33.800000 -97.110000 
1555 2106.947000 20130516-044310 33.470000 -94.620000
1559 2106.947000 20130516-044310 31.460000 -97.260000
1573 1807.591000 20130516-045311 41.150000 -94.020000
1573 3912.392000 20130516-045311 41.090000 -93.960000
1585 1806.756000 20130516-045812 31.040000 -98.880000
1585 2709.153000 20130516-045812 31.010000 -98.790000
1611 1804.813000 20130516-051316 31.280000 -97.800000

Upvotes: 0

Related Questions