Reputation: 97
With the following input,
08 V 3.8 0.0 23.456 60.459 60.459
09 M 4.4 0.0 24.960 72.301 72.301
10 L 4.4 0.0 25.301 95.197 95.197
11 L 1.9 0.0 25.410 99.173 99.173
12 L 1.7 0.0 25.484 99.862 99.862
104 V 7.1 0.0 0.374 5.225 5.225
105 L 0.7 0.0 0.374 5.119 5.119
169 V 4.6 0.1 0.000 31.658 1.658
170 S 5.7 0.0 0.000 32.117 1.117
171 S 5.7 0.0 0.000 32.117 5.001
260 Y 4.8 0.0 0.342 54.178 54.178
261 S 4.1 0.0 0.144 67.833 67.833
262 I 8.4 0.0 0.000 87.300 87.300
263 I 9.5 0.0 0.000 88.950 88.950
264 I 11.3 0.1 0.000 89.070 89.070
Output rows that match the following two conditions,
So the output for the above input should be as follows. The rows that start with 104 and 105 are excluded because these are only two consecutive numbers, while the rows that start with 169 and 170 are excluded because the 6th column value is less than 5.
08 V 3.8 0.0 23.456 60.459 60.459
09 M 4.4 0.0 24.960 72.301 72.301
10 L 4.4 0.0 25.301 95.197 95.197
11 L 1.9 0.0 25.410 99.173 99.173
12 L 1.7 0.0 25.484 99.862 99.862
260 Y 4.8 0.0 0.342 54.178 54.178
261 S 4.1 0.0 0.144 67.833 67.833
262 I 8.4 0.0 0.000 87.300 87.300
263 I 9.5 0.0 0.000 88.950 88.950
264 I 11.3 0.1 0.000 89.070 89.070
The first part of the code is straightforward with the following awk one-liner
LC_ALL=C awk '$6>5 {print}' input
But I am getting trumped with the second condition. Really appreciate any help with it.
Upvotes: 1
Views: 121
Reputation: 16819
I'm not sure I understand the criterion but this code gives the correct output with your test data.
LC_ALL=C awk '
function maybeprint() {
if (n>=5 && ok) print buf
}
{
if ($1 != p1+1) {
maybeprint()
buf = $0
n = ok = 1
} else {
buf = buf RS $0
++n
}
p1 = $1
}
$6<=5 { ok = 0 }
END { maybeprint() }
' input
$6
have been > 5).$1
is not one more than previous value, print buffer if appropriate state, then clear it and reset stateUpvotes: 4
Reputation: 17216
OP's definition of "consecutive" is a little different from what I understood at first:
Let n0
be the value of $1
at iteration 0,
v0
be the value of $6
at iteration 0,
n1
be the value of v1
at iteration 1,
and v1
be the value of $6
at iteration 1.
Two consecutive lines must satisfy the three following conditions:
n0 + 1 == n1
v0 > 5
v1 > 5
Then if you need to output the lines that are part of a group of at least 5 consecutive lines then you can use something like this:
awk '
{
consecutive_count = ($6 > 5 ? ($1-1 == previous_1 ? consecutive_count+1 : 1) : 0)
saved_lines[consecutive_count] = $0
previous_1 = $1+0
}
consecutive_count == 5 {
for (i = 1; i <= consecutive_count; i++)
print saved_lines[i]
}
consecutive_count > 5
'
As @Daweo said, you need to bufferize some input lines because you can't determine how many consecutive ones there will be while examining the current line. Because your constraint is of 5 consecutive lines, then you can just store 5 of them in an array, using for eg. NR
modulo five as key.
Then you also need to determine the current number of "consecutive" lines; for that you'll have to "save" the current value of $1
and use it in the next line iteration.
Here you go:
awk '
{
buffered_lines[NR%5] = $0
consecutives_lines = ($1-1 == previous_1 ? consecutives_lines + 1 : 1)
previous_1 = $1+0
}
consecutives_lines == 5 {
for (i = NR-4; i <= NR ; i++) {
$0 = buffered_lines[i%5]
if ($6+0 > 5)
print
}
}
consecutives_lines > 5 && $6+0 > 5
'
note: by setting $0
to buffered_lines[...]
you can make awk
redo the splitting for you and then access the 6th field of the "buffered line" as $6
. You have to be careful when using this method as the current line will be lost. Here it isn't harmful as $0
, $1
, etc... are not used further down the code (and incidentally, the current line was also restored in $0
in the last iteration of the for
loop).
08 V 3.8 0.0 23.456 60.459 60.459
09 M 4.4 0.0 24.960 72.301 72.301
10 L 4.4 0.0 25.301 95.197 95.197
11 L 1.9 0.0 25.410 99.173 99.173
12 L 1.7 0.0 25.484 99.862 99.862
260 Y 4.8 0.0 0.342 54.178 54.178
261 S 4.1 0.0 0.144 67.833 67.833
262 I 8.4 0.0 0.000 87.300 87.300
263 I 9.5 0.0 0.000 88.950 88.950
264 I 11.3 0.1 0.000 89.070 89.070
Upvotes: 3
Reputation: 36700
Really appreciate any help with it.
GNU AWK
allows you to store values in array. In case you need to use value which is n lines before it is handy to use NR
(number of row) as key. Consider following simple example, let file.txt
content be
0 Able
2 Baker
4 Charlie
8 Dog
16 Easy
then
awk '{arr[NR]=$1;print "current value",arr[NR],"value 1 line before",(NR-1 in arr?arr[NR-1]:"n/a"),"value 2 lines before",(NR-2 in arr?arr[NR-2]:"n/a")}' file.txt
gives output
current value 0 value 1 line before n/a value 2 lines before n/a
current value 2 value 1 line before 0 value 2 lines before n/a
current value 4 value 1 line before 2 value 2 lines before 0
current value 8 value 1 line before 4 value 2 lines before 2
current value 16 value 1 line before 8 value 2 lines before 4
Explanation: I store 1st field values in array arr
. I use so-called ternary operator to test if key is present in array arr
and if so I print correspoding value, otherwise n/a
.
(tested in GNU Awk 5.3.1)
Upvotes: 2