Ajay Sridhara
Ajay Sridhara

Reputation: 39

AWK - Parsing SQL output

I have a SQL output something like below from the output of a custom tool. Would appreciate any help in finding what I am doing incorrectly.

column1                  | column2 | column3 | column4 | column5 | column6 |     column7     | column8 | column9 |        column10            |          column11          
--------------------------------------+----------+-------------+-------------+--------------------+-----------------------+--------------------+---------------+----------------
 cec75                   | 1234     | 007    |         |    2810 |         | SOME_TEXT       |         |         | 2020-12-07 20:28:46.865+00 | 2020-12-08 06:40:10.231635+00
(1 row)

I am trying to pipe this output the columns I need in my case column1, column2, and column7. I have tried piping out like this but it just prints column1

tool check | awk '{print $1, $2}'

column1 |
--------------------------------------+----------+-------------+-------------+--------------------+-----------------------+--------------------+---------------+----------------+----------------------------+------------------------------- 
cec75 |
(1 row) 

It would be nice to have something like this.

ce7c5,1234,SOME_TEXT

My file contents


                  column1                  | column2 | column3 | column4 | column5 | column6 |     column7     | column8 | column9 |        column10         |          column11          
--------------------------------------+----------+-------------+-------------+--------------------+-----------------------+--------------------+---------------+----------------+----------------------------+-------------------------------
 6601c | 2396     | 123         |             |               9350 |                       | SOME_TEXT |               |                | 2020-12-07 22:49:01.023+00 | 2020-12-08 07:22:37.419669+00
(1 row)


                  column1                  | column2 | column3 | column4 | column5 | column6 |     column7     | column8 | column9 |        column10         |          column11          
--------------------------------------+----------+-------------+-------------+--------------------+-----------------------+--------------------+---------------+----------------+----------------------------+-------------------------------
 cec75 | 1567     | 007        |             |               2810 |                       | SOME_TEXT |               |                | 2020-12-07 20:28:46.865+00 | 2020-12-08 07:28:10.319888+00
(1 row)

Upvotes: 3

Views: 1117

Answers (3)

Daweo
Daweo

Reputation: 36450

You need to set correct FS and somehow filters out undesired (junk) lines. I would do it following way. Let file.txt content be:

column1                  | column2 | column3 | column4 | column5 | column6 |     column7     | column8 | column9 |        column10            |          column11          
--------------------------------------+----------+-------------+-------------+--------------------+-----------------------+--------------------+---------------+----------------
 cec75                   | 1234     | 007    |         |    2810 |         | SOME_TEXT       |         |         | 2020-12-07 20:28:46.865+00 | 2020-12-08 06:40:10.231635+00
(1 row)

then

awk 'BEGIN{FS="[[:space:]]+\\|[[:space:]]+";OFS=","}(NR>=2 && NF>=2){print $1,$2,$7}' file.txt

output:

cec75,1234,2020-12-07 20:28:46.865+00

Explanation: I set field separator (FS) to one or more :space: literal | one or more :space: where :space: means any whitespace. Depending on your data you might elect to use zero or more rather than one or more - to do so replace + with *. For every line which is not first one (this filter out header) and has at least 2 fields (this filter out line with - and + and (1 row)) I print content of 1st column followed by , followed by content of 2nd column followed by , followed by content of 7th column.

Upvotes: 2

Michael Back
Michael Back

Reputation: 1871

Description:

Command line switches...

  • The delimiter is | surrounded by spaces. (Note that we need to use a couple of \'s to escape | if we feed the regex for the delimiter in from the command line.)
  • In addition to input delimiter (input field separator) the output delimiter (output field separator) can also be set using a command line switch.

The awk script...

  • If a header is encountered or a ( is seen on a line, it's not a valid line; so, just ignore it.
  • If the line now has any alphanumeric characters, it's now a valid line to operate on; so, and we strip the leading spaces off the line, and then print the columns we want.
tool check | awk -F' *\\| *' -v OFS=, '/column|\(/ { next } /[[:alnum:]]/ { sub(/^ +/, ""); print $1, $2, $7 }'

Examining the data more closely... It looks as though the date-stamp (which always has a : in it) might be present on all valid records... If so, the script can be reduced to something much more simple.

tool check | awk -F' *\\| *' -v OFS=, '$10 ~ /:/ { sub(/^ +/, ""); print $1, $2, $7 }'

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133518

EDIT: Since OP added edited set of samples, so adding this solution now. This considers that you want to print lines after lines which starts from ---.

awk -F'[[:space:]]*\\|[[:space:]]*' '/^---/{found=1;next} found{print $1,$2,$7;found=""}' Input_file

OR

your_command | 
awk -F'[[:space:]]*\\|[[:space:]]*' '/^---/{found=1;next} found{print $1,$2,$7;found=""}'

Upvotes: 1

Related Questions