Reputation: 33

extract each line followed by a line with a different value in column two

Given the following file structure,

9.975   1.49000000      0.295   0       0.4880  0.4929  0.5113  0.5245  2.016726        1.0472  -30.7449        1
9.975   1.49000000      0.295   1       0.4870  0.5056  0.5188  0.5045  2.015859        1.0442  -30.7653        1
9.975   1.50000000      0.295   0       0.5145  0.4984  0.4873  0.5019  2.002143        1.0854  -30.3044        2

is there a way to extract each line in which the value in column two is not equal to the value in column two in the following line? I.e. from these three lines I would like to extract the second one, since 1.49 is not equal to 1.50. Maybe with sed or awk?

This is how I do this in MATLAB:

myline = 1;
mynewline = 1;
while myline < length(myfile)
    if myfile(myline,2) ~= myfile(myline+1,2)
        mynewfile(mynewline,:) = myfile(myline,:);
        mynewline = mynewline+1;
        myline = myline+1;
    else
        myline = myline+1;
    end
end

However, my files are so large now that I would prefer to carry out this extraction in terminal before transferring them to my laptop.

Upvotes: 2

Answers (3)

potong

Reputation: 58488

This might work for you (GNU sed):

sed -r 'N;/^((\S+)\s+){2}.*\n\S+\s+\2/!P;D' file

Read two lines at a time. Pattern match on the first two columns and only print the first line when the second column does not match.

Upvotes: 2

liborm

Reputation: 2724

Awk should do.

<data awk '($2 != prev) {print line} {line = $0; prev = $2}'

A brief intro to awk: awk program consists of a set of condition {code} blocks. It operates line by line. When no condition is given, the block is executed for each line. BEGIN condition is executed before the first line. Each line is split to fields, which are accessible with $_number_. The full line is in $0.

Here I compare the second field to the previous value, if it does not match I print the whole previous line. In all cases I store the current line into line and the second field into prev.

And if you really want it right, careful with the float comparisons - something like abs($2 - prev) < eps (there is no abs in awk, you need to define it yourself, and eps is some small enough number). I'm actually not sure if awk converts to number for equality testing, if not you're safe with the string comparisons.

Upvotes: 3

Birei

Reputation: 36282

Try following command:

awk '$2 != field && field { print line } { field = $2; line = $0 }' infile

It saves previous line and second field, comparing in next loop with current line values. The && field check is useful to avoid a blank line at the beginning of file, when $2 != field would match because variable is empty.

It yields:

9.975   1.49000000      0.295   1       0.4870  0.5056  0.5188  0.5045  2.015859        1.0442  -30.7653        1

Upvotes: 1

extract each line followed by a line with a different value in column two

Answers (3)

Related Questions