Reputation: 115
I have lots of unsorted data in text file in the following form:
1.0 10
1.8 10
1.1 10
1.9 20
2.8 20
2.1 20
2.9 20
...
For each value in the second column, I want to get the interval of values in the first column. So for the example above, the result should be
1.0 1.8 10
1.9 2.9 20
How can I do this with c/c++, awk or other linux shell tools?
Upvotes: 0
Views: 123
Reputation:
To add another alternative, you could do this in R as well:
d.in <- read.table(file = commandArgs(trailingOnly = T)[1]);
write.table(
aggregate(V1 ~ V2, d.in, function (x) c(min(x),max(x)))[,c(2,1)]
, row.names = F
, col.names = F
, sep = "\t");
Then just call this script with Rscript
:
$ Rscript script.R data.txt
1 1.8 10
1.9 2.9 20
Upvotes: 0
Reputation: 265201
I think this should work:
{ read vStart int &&
while read vNext nextInt; do
if [ $int -ne $nextInt ]; then
echo "$vStart $v $int";
vStart=$vNext;
fi
v=$vNext;
int=$nextInt;
done &&
echo "$vStart $v $int"; }
Upvotes: 0
Reputation: 785058
You can use this awk:
awk '{
if (!($2 in nmin) || $1<nmin[$2])
nmin[$2]=$1;
else if ($1>=nmax[$2])
nmax[$2]=$1
}
END {
for (a in nmin)
print nmin[a], nmax[a], a
}
' inFile
Upvotes: 1
Reputation: 195039
this one-liner should work for you:
awk '!($2 in i){i[$2]=$1}{a[$2]=$1}END{for(x in i)print i[x],a[x],x}' file
output:
1.0 1.8 10
1.9 2.9 20
Upvotes: 1