Reputation: 61
I am new in bash and linux programming. I have a small problem.
For a particular cut-off (c) I want to dump a file which will print out values above c if two consecutive values are above c. For example
x y
1 0.34
2 0.3432
3 0.32
4 0.35
5 0.323
6 0.3623
7 0.345
It will print out column 2 if c=0.33
0.34
0.3432
0.3623
0.345
It will not print out 0.35 despite it was above cut off 0.33 because the next value after 0.35 was 0.323 which fails the argument 'two consecutive values are above c'.
Upvotes: 0
Views: 199
Reputation: 103783
The way you use a Bash parameter in awk is like so:
$ c=2.3
$ awk -v c="$c" 'BEGIN{print c}'
2.3
You can then use that to write you script like so:
c=0.33
m=2
awk -v c="$c" -v m="$m" '($2+0!=$2) {next}
$2+0<c {cnt=0; split("",lst); next}
$2+0>=c && cnt<m {lst[++cnt]=$2}
$2+0>=c && cnt==m {for (i=1; i<=m; i++) print lst[i]
cnt=0; split("",lst)}' file
That will not print overlapping ranges such as:
1 0.34
2 0.3432 # prints 0.34\n0.3432\n here
3 0.35 # unclear if it should print 0.3432\n0.34\n here....
Given the update, this will print contiguous runs of lines.
Given:
$ cat file
x y
1 0.34
2 0.3432
2a 0.35
3 0.32
4 0.35
5 0.323
6 0.3623
7 0.345
You can do:
c=0.33
m=2
awk -v c="$c" -v m="$m" '($2+0!=$2) {next}
$2+0>=c {lst[++cnt]=$2; next}
$2+0<c { if (cnt>=m) for (i=1; i<=cnt; i++) print lst[i]
cnt=0; split("",lst); next}
END{if (cnt>=m) for (i=1; i<=cnt; i++) print lst[i]}' file
Prints:
0.34
0.3432
0.35
0.3623
0.345
Upvotes: 0
Reputation: 26471
Original Question: print all sequences where 2 or more consecutive values satisfy a given condition
The following should work :
awk 'p || (prev>c && $2>c && NR>2){print prev}
{ p = (prev>c && $2>c); prev=$2 }
END{if(p) print $2 }' c=0.33 <file>
It makes the following logic :
p
keeps track if the previous line has been printed. If it is printed then the current line should also be printed.p==0
), then you should check if you should print the previous line if (prev>c && $2>c)
p
for the next line and set prev
to the current valuep==1
print the last value.You essentially always run one line behind.
Another way to approach this is checking if the value satisfies the condition and store it in an array. If you encounter a value that does not satisfy the condition, process the array. This is a bit more memory intensive :
awk '(NR==1){next}
($2>c) { a[NR]=$2; next }
(length(a) == 1) { delete a[NR-1]; next }
{ for(i=NR-length(a);i<NR;++i) {print a[i]; delete a[i]} }
END { if (length(a)>1) for(i=NR+1-length(a);i<=NR;++i) {print a[i]} }
' c=0.33 <file>
Second question: print the subset of consecutive values of $2 for which m
or more values satisfy condition cond
and at most n
consecutive values do not satisfy cond
. The sequence starts and ends with a value satisfying cond
The following awk
script will do this. Don't forget to adjust the values m
, n
and c
to your wishes and update the conditional function.
function cond(val) { return val > c }
BEGIN{c=0.33; m=2; n=1}
# skip the header
(NR==1){next}
# if no values satisfy cond ...
(M==0 && !cond($2)) { next }
# ... otherwise continue from here
{ a[NR]=$2 }
# set counters M and N (M satisfy cond, N not )
cond($2) { M++; N=0 }
!cond($2) { N++ }
# This sequence failed, delete it
(N>n && M<m) { for(i in a) delete a[i]; M=0; N=0 }
# This sequence is OK, strip it and print it
(N>n) { j=NR; while (!cond(a[j])) delete a[j--]
for (i=j+1-length(a);i<=j;++i) { print a[i]; delete a[i] }
M=0; N=0 }
# Check if the final stored sequence is successful
END { if (M>=m) {
j=NR; while (!cond(a[j])) delete a[j--]
for (i=j+1-length(a);i<=j;++i) print a[i]
}
}
Upvotes: 1
Reputation: 241828
Perl solution:
c=.33 m=2 perl -lane '
if ($F[1] > $ENV{c}) { push @r, $F[1] }
else {
if (@r >= $ENV{m}) { print for @r }
@r = ();
}
END { if (@r >= $ENV{m}) { print for @r } }' -- file
It stores the consecutive values into an array @r, if the current value is under threshold, it prints the array if it's long enough.
-l
removes newlines from input and adds them to output-n
reads the input line by line-a
autosplits each line into the @F arrayIf the sequences tend to be very long, you can only store the first m
elements in the array to save some memory.
if ($F[1] > $ENV{c}) {
push @r, $F[1];
print shift @r if @r > $ENV{m};
} else {
if (@r >= $ENV{m}) { print for @r }
@r = ();
}
END { if (@r >= $ENV{m}) { print for @r } }'
Upvotes: 0
Reputation: 13249
You could use this awk script:
awk -v cutoff="0.33" '
$2>cutoff{
if(prev)
{print prev ORS $2 }
else
{prev=$2;next}
}
{prev=""}' file
It stores the value if above the cutoff in the prev
variable and resets it at the next number.
Upvotes: 0