Reputation: 33
Given the following file structure,
9.975 1.49000000 0.295 0 0.4880 0.4929 0.5113 0.5245 2.016726 1.0472 -30.7449 1
9.975 1.49000000 0.295 1 0.4870 0.5056 0.5188 0.5045 2.015859 1.0442 -30.7653 1
9.975 1.50000000 0.295 0 0.5145 0.4984 0.4873 0.5019 2.002143 1.0854 -30.3044 2
is there a way to extract each line in which the value in column two is not equal to the value in column two in the following line? I.e. from these three lines I would like to extract the second one, since 1.49 is not equal to 1.50. Maybe with sed or awk?
This is how I do this in MATLAB:
myline = 1;
mynewline = 1;
while myline < length(myfile)
if myfile(myline,2) ~= myfile(myline+1,2)
mynewfile(mynewline,:) = myfile(myline,:);
mynewline = mynewline+1;
myline = myline+1;
else
myline = myline+1;
end
end
However, my files are so large now that I would prefer to carry out this extraction in terminal before transferring them to my laptop.
Upvotes: 2
Views: 273
Reputation: 58488
This might work for you (GNU sed):
sed -r 'N;/^((\S+)\s+){2}.*\n\S+\s+\2/!P;D' file
Read two lines at a time. Pattern match on the first two columns and only print the first line when the second column does not match.
Upvotes: 2
Reputation: 2724
Awk should do.
<data awk '($2 != prev) {print line} {line = $0; prev = $2}'
A brief intro to awk: awk program consists of a set of condition {code}
blocks. It operates line by line. When no condition is given, the block is executed for each line. BEGIN
condition is executed before the first line. Each line is split to fields, which are accessible with $_number_
. The full line is in $0
.
Here I compare the second field to the previous value, if it does not match I print
the whole previous line. In all cases I store the current line into line
and the second field into prev
.
And if you really want it right, careful with the float comparisons - something like abs($2 - prev) < eps
(there is no abs
in awk, you need to define it yourself, and eps is some small enough number). I'm actually not sure if awk converts to number for equality testing, if not you're safe with the string comparisons.
Upvotes: 3
Reputation: 36282
Try following command:
awk '$2 != field && field { print line } { field = $2; line = $0 }' infile
It saves previous line and second field, comparing in next loop with current line values. The && field
check is useful to avoid a blank line at the beginning of file, when $2 != field
would match because variable is empty.
It yields:
9.975 1.49000000 0.295 1 0.4870 0.5056 0.5188 0.5045 2.015859 1.0442 -30.7653 1
Upvotes: 1