Reputation: 1051
I'm trying to find the max values in a numeric string and some of the data contains trailing 9s.
999999999999 63 66 69 71 73 75 76 78 80 81 81 80 79 74 67 63999999999999999
I've been using the following command to find the max value of the numbers but, obviously the command sees the data with trailing 9s as the "max" (ex., 6399999....) and ignores the actual max values. Some of the data also contains bad data that is just a bunch of 9s.
grep -Eo '[0-9]+' file_temp | sort -rn | head -n 1 > file_temp_max
How can I get rid of the bad data (ex., 999999) and how can I correct the values with trailing 9s (6399999... > 64) so that they can be rounded (and included) in the data set?
Upvotes: 0
Views: 145
Reputation: 16138
Building from your example code:
grep -Eo '[0-9]+' file_temp | awk '
$1 ~ /999999999999999/ { sub(/999999999999999$/,""); $1++}
$0 != 999999999999'
This gets each number on its own line, then uses awk
to revise each line. awk
examines any line with 15 9
s and peels them off, then increments the number. The next line prints anything that isn't eleven nines.
The above assumes 1239999999999999999
should be 1240
. If instead it should be 124
:
grep -Eo '[0-9]+' file_temp | awk '
$1 ~ /^999+$/ { next }
$1 ~ /999$/ { sub(/9+$/,""); $1++}
{ print }'
The first awk
line skips lines that are just nines, the second removes all trailing lines and increments the number, the third prints. I'm keying on 3+ nines on the assumption that 9
and 99
are valid.
Upvotes: 0
Reputation: 289665
To "clean" the data, you can do the following by looping through all the fields:
9
s, remove it.9
s, remove them and increment the remaining number in one.See it in action with your given input:
$ awk '{for(i=1;i<=NF;i++) {if ($i~/^9+$/) $i=""; if (sub(/9+$/,"",$i)) $i++}}1' a
63 66 7 71 73 75 76 78 80 81 81 80 8 74 67 64
Then getting the maximum value is trivial by using any of the algorithms in How to get the biggest number in a file?
Upvotes: 1
Reputation: 246807
I'm assuming that "a space followed by 2 digits" is a valid way to extract the numbers you want:
echo 999999999999 63 66 69 71 73 75 76 78 80 81 81 80 79 74 67 63999999999999999 |
grep -o ' [0-9][0-9]' |
sort -n |
tail -1
produces
81
Upvotes: 0
Reputation: 785128
You can use this awk:
awk -v RS=' ' '{gsub(/9+$/, ".&", $1); $1=int($1); print $1; if ($1>max) max=$1}
END{print "max = ", max}' file
0
63
66
6
71
73
75
76
78
80
81
81
80
7
74
67
64
max = 81
gsub(/9+$/, ".&", $1)
will insert a decimal point before ending 9s.
$1=int($1)
will take integer value from decimal numbers thus rounding them off.
if ($1>max) max=$1
is simple max
computation.
Upvotes: 0
Reputation: 2527
This is a slightly different way from Adams answer and uses sed from within a loop.
First off, I'm working on the assumption that you don't know how many 9's will be included. Secondly, I'm using an intermediate conversion to float.
for line in $(cat file_temp); do
i=$(echo $line | sed 's/../.&/;t;s/^.$/.0&/');
printf "%.02f\n" $i;
done | sed 's/\.//;s/^0//' | sort -nr
Breakdown:
sed 's/../.&/;t;s/^.$/.0&/'
add a decimal point after the second character
printf "%.02f\n" $i;
print the value as a floating point number - automatically rounds up for you.
sed 's/\.//;s/^0//'
strip leading 0 and . leaving just the remaining integer
Upvotes: 1