Symeon Mattes
Symeon Mattes

Reputation: 1268

Using awk to extract specific pattern if it matches otherwise the previous line

I have the following file:

[INFO]   com.fasterxml.jackson.core:jackson-core ............. 2.10.4 -> 2.11.0
[INFO]   com.fasterxml.jackson.core:jackson-databind ......... 2.10.4 -> 2.11.0
[INFO]   com.fasterxml.jackson.dataformat:jackson-dataformat-avro ...
[INFO]                                                         2.10.4 -> 2.11.0
[INFO]   com.fasterxml.jackson.dataformat:jackson-dataformat-cbor ...
[INFO]                                                         2.10.4 -> 2.11.0

What I would like to have at the end is:

com.fasterxml.jackson.core:[email protected] -> 2.11.0
com.fasterxml.jackson.core:[email protected] -> 2.11.0
com.fasterxml.jackson.dataformat:[email protected] -> 2.11.0
com.fasterxml.jackson.dataformat:[email protected] -> 2.11.0

How could I achieve that?

Removing the [INFO] and replacing the ... with @ is quite straightforward, i.e.

cat temp1 | sed "s/\[INFO\]//g" | sed "s/\s//g" | sed -E "s/\.\.+/@/"

which does:

com.fasterxml.jackson.core:[email protected]>2.11.0
com.fasterxml.jackson.core:[email protected]>2.11.0
com.fasterxml.jackson.dataformat:jackson-dataformat-avro@
2.10.4->2.11.0
com.fasterxml.jackson.dataformat:jackson-dataformat-cbor@
2.10.4->2.11.0

however I'm not sure how I could use awk in order to get all the lines with -> but if the line doesn't have ... then get the previous + the current, otherwise only the current.

Thanks

Upvotes: 2

Views: 64

Answers (2)

markp-fuso
markp-fuso

Reputation: 34094

Using awk to address just the 'next step' after the OP's cat/sed/sed/sed step ...

For testing purposes:

$ cat com.data
com.fasterxml.jackson.core:[email protected]>2.11.0
com.fasterxml.jackson.core:[email protected]>2.11.0
com.fasterxml.jackson.dataformat:jackson-dataformat-avro@
2.10.4->2.11.0
com.fasterxml.jackson.dataformat:jackson-dataformat-cbor@
2.10.4->2.11.0

One awk solution:

awk '
BEGIN  { pfx="" }                                       # initial printf prefix is empty string

/^com/ { printf "%s%s", pfx, $0 ; pfx="\n" ; next }     # if line starts with "^com" then print prefix and current line
                                                        # set prefix to "\n" for rest of script
                                                        # skip to next line of input

       { printf "%s", $0 }                              # otherwise printf current line

END    { printf "\n" }                                  # print one last "\n" once done
' com.data

NOTE: Remove comments to declutter code.

The above generates:

com.fasterxml.jackson.core:[email protected]>2.11.0
com.fasterxml.jackson.core:[email protected]>2.11.0
com.fasterxml.jackson.dataformat:[email protected]>2.11.0
com.fasterxml.jackson.dataformat:[email protected]>2.11.0

OP should be able to implement this with:

cat temp1 | sed "s/\[INFO\]//g" | sed "s/\s//g" | sed -E "s/\.\.+/@/" | awk 'BEGIN ... '

Personally, I'd probably scrap the entire cat/sed/sed/sed block in favor of a single awk solution (eg, Ed's answer).

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203229

$ awk 'NF!=4{tag=$2} NF>3{print tag "@" $(NF-2), $(NF-1), $NF}' file
com.fasterxml.jackson.core:[email protected] -> 2.11.0
com.fasterxml.jackson.core:[email protected] -> 2.11.0
com.fasterxml.jackson.dataformat:[email protected] -> 2.11.0
com.fasterxml.jackson.dataformat:[email protected] -> 2.11.0

Upvotes: 2

Related Questions