jackscorrow
jackscorrow

Reputation: 702

Extract line before first empty line after match

I have some CSV file in this form:

* COMMENT
* COMMENT
100 ; 1706 ; 0.18 ; 0.45 ; 0.00015 ; 0.1485 ; 0.03 ; 1 ; 1 ; 2 ; 280 ; 100 ; 100 ; 

* COMMENT
* COMMENT

* ZT vector
0; 367; p; nan
1; 422; p; nan
2; 1; d; nan

* KS vector
0; 367; p; 236.27
1; 422; p; 236.27
2; 1; d; 236.27



*Total time: 4.04211

I need to extract the last line before an empty line after matching the pattern KS vector.

To be clearer, in the above example I would like to extract the line

2; 1; d; 236.27

since it's the non empty line just before the first empty one after I got the match with KS vector.

I would also like to use the same script to extract the same kind of line after matching the pattern ZT vector, that in the above example would return

2; 1; d; nan

I need to do this because I need the first number of that line, since it tells me the number of consecutive non-empty lines after KS vector. My current workaround is this:

# counting number of lines after matching "KS vector" until first empty line
var=$(sed -n '/KS vector/,/^$/p' file | wc -l)
# Subtracting 2 to obtain actual number of lines
var=$(($var-2))

But if I could extract directly the last line I could extract the first element (2 in the example) and add 1 to it to obtain the same number.

Upvotes: 2

Views: 656

Answers (3)

Ed Morton
Ed Morton

Reputation: 203254

You're going about this the wrong way. All you need is to put awk into paragraph mode and print 1 less than the number of lines in the record (since you don't want to include the KS vector line in your count):

$ awk -v RS= -F'\n' '/KS vector/{print NF-1}' file
3

Here's how awk sees the record when you put it into paragraph mode (by setting RS to null) with newline-separated fields (by setting FS to a newline):

$ awk -v RS= -F'\n' '/KS vector/{ for (i=1;i<=NF;i++) print NF, i, "<"$i">"}' file
4 1 <* KS vector>
4 2 <0; 367; p; 236.27>
4 3 <1; 422; p; 236.27>
4 4 <2; 1; d; 236.27>

Upvotes: 3

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

With awk expression:

awk -v vec="KS vector" '$0~vec{ f=1 }f && !NF{ print r; exit }f{ r=$0 }' file
  • vec - variable containing the needed pattern/vector

  • $0~vec{ f=1 } - on encountering the needed pattern/vector - set the flag f in active state

  • f{ r=$0 } - while the flag f is active(under needed vector section) - capture the current line into variale r

  • f && !NF{ print r; exit } - (NF - total number of fields, if the line is empty - there's no fields !NF) on encountering empty line while iterating through the needed vector lines - print the last captured non-empty line r

  • exit - exit script execution immediately (avoiding redundant actions/iterations)

The output:

2; 1; d; 236.27

If you want to just print the actual number of lines under found vector use the following:

awk -v vec="KS vector" '$0~vec{ f=1 }f && !NF{ print r+1; exit }f{ r=$1 }' file
3

Upvotes: 2

Raman Sailopal
Raman Sailopal

Reputation: 12867

With awk:

awk '$0 ~ "KS vector" { valid=1;getline } valid==1 { cnt++;dat[cnt]=$0 } $0=="" { valid="" } END { print dat[cnt-1]  }' filename

Check for any lines matching "KS vector". Set a valid flag and then read in the next line. Read the data into an array with an incremented counter. When space is encountered, reset the valid flag. At the end print the last but one element of the dat array.

Upvotes: 0

Related Questions