Dong
Dong

Reputation: 115

Finding minimum and maximum values of the first column - grouped by the second column

I have lots of unsorted data in text file in the following form:

1.0 10
1.8 10
1.1 10
1.9 20
2.8 20
2.1 20
2.9 20
...

For each value in the second column, I want to get the interval of values in the first column. So for the example above, the result should be

1.0 1.8 10
1.9 2.9 20

How can I do this with c/c++, awk or other linux shell tools?

Upvotes: 0

Views: 123

Answers (4)

user918938
user918938

Reputation:

To add another alternative, you could do this in R as well:

d.in <- read.table(file = commandArgs(trailingOnly = T)[1]);
write.table(
    aggregate(V1 ~ V2, d.in, function (x) c(min(x),max(x)))[,c(2,1)]
    , row.names = F
    , col.names = F
    , sep = "\t");

Then just call this script with Rscript:

$ Rscript script.R data.txt 
1       1.8     10
1.9     2.9     20

Upvotes: 0

knittl
knittl

Reputation: 265201

I think this should work:

{ read vStart int &&
while read vNext nextInt; do
  if [ $int -ne $nextInt ]; then
    echo "$vStart $v $int";
    vStart=$vNext;
  fi

  v=$vNext;
  int=$nextInt;
done &&
echo "$vStart $v $int"; }

Upvotes: 0

anubhava
anubhava

Reputation: 785058

You can use this awk:

awk '{
        if (!($2 in nmin) || $1<nmin[$2])
            nmin[$2]=$1;
         else if ($1>=nmax[$2])
            nmax[$2]=$1
     }
     END {
        for (a in nmin)
           print nmin[a], nmax[a], a
     }
' inFile

Upvotes: 1

Kent
Kent

Reputation: 195039

this one-liner should work for you:

 awk '!($2 in i){i[$2]=$1}{a[$2]=$1}END{for(x in i)print i[x],a[x],x}' file

output:

1.0 1.8 10
1.9 2.9 20

Upvotes: 1

Related Questions