Filtering multiline pcregrep match with sed

Question

I have data in multiple text files that look like this:

1  DAEJ             X            -3120041.6620      -3120042.0476     -0.3856      0.0014               
                    Y             4084614.2137       4084614.6871      0.4734      0.0015               
                    Z             3764026.4954       3764026.7346      0.2392      0.0014               

                    HEIGHT            116.0088           116.6419      0.6332      0.0017      0.0017    8.0
                    LATITUDE     36 23 57.946407    36 23 57.940907   -0.1699      0.0013      0.0012   57.5      0.0012   62.9
                    LONGITUDE   127 22 28.131395   127 22 28.132160    0.0190      0.0012      0.0013    2.3      0.0013

and I want to run it through a filter so that the output will look like this:

DAEJ: 36 23 57.940907, 127 22 28.132160, 116.6419

I can do it easily enough with grepWin using named capture by searching for:

(?\w\w\w\w+)

(?\-?\d+\.\d+)(?\d+\.\d+)
(?\-?\ *\d+\ +\d+\ +\d+\.\d+)(?\d+\.\d+)
(?\-?\ *\d+\ +\d+\ +\d+\.\d+)(?\d+\.\d+)

and repacing with (ignore the unreferenced groups, I'll use that in other implementations):

$+{site}: $+{lat}, $+{lon}, $+{height}

but of course, at the cost of doing it manually through a GUI. I was wondering if there's a way to script it by piping pcregrep output to sed for text substitution? I'm aware of the pcregrep -M option to match the multiline regex pattern above, and I've been successful until that point, but I'm stuck with the sed end of the problem.

Steve · Accepted Answer

I would be using awk to handle your text file:

awk '$1 ~ /^[0-9]+$/ { printf "%s: ", $2 } $1 == "HEIGHT" { height = $3 } $1 == "LATITUDE" { printf "%s %s %s, ", $2, $3, $4 } $1 == "LONGITUDE" { printf "%s %s %s, %s
", $5, $6, $7, height }' file.txt

Broken out on multiple lines for readability:

$1 ~ /^[0-9]+$/ { 
    printf "%s: ", $2
}

$1 == "HEIGHT" {
    height = $3
}

$1 == "LATITUDE" {
    printf "%s %s %s, ", $2, $3, $4
}

$1 == "LONGITUDE" {
    printf "%s %s %s, %s
", $5, $6, $7, height
}

Results:

DAEJ: 36 23 57.946407, 127 22 28.132160, 116.6419

EDIT:

Put the following code in a file called script.awk:

$3 == "X" {
    printf "%s: ", $2
}

$1 == "HEIGHT" {
    height = $3
}

$1 == "LATITUDE" {
    if ($2 == "-" && $6 == "-") { printf "-%s %s %s, ", $7, $8, $9 }
    else if ($2 == "-") { printf "%s %s %s, ", $6, $7, $8 }
    else if ($5 == "-") { printf "-%s %s %s, ", $6, $7, $8 }
    else { printf "%s %s %s, ", $5, $6, $7 }
}

$1 == "LONGITUDE" {
    if ($2 == "-" && $6 == "-") { printf "-%s %s %s, %s
", $7, $8, $9, height }
    else if ($2 == "-") { printf "%s %s %s, %s
", $6, $7, $8, height }
    else if ($5 == "-") { printf "-%s %s %s, %s
", $6, $7, $8, height }
    else { printf "%s %s %s, %s
", $5, $6, $7, height }
}

Run like this:

awk -f script.awk file.txt

Filtering multiline pcregrep match with sed

Answers (2)

Related Questions