WrathWolf
WrathWolf

Reputation: 45

Removing text up to a blank line

So I have a report log file that represents a bunch of source files that are missing. I want to clear out the files that are fine. Given the example, how would I remove the line "The following files have been resolved:" and everything after it until the space? The length of the number of resolved files in different and therefore I can't use a set number of lines after I see that phrase.

Example:

 ------------------------------------------------------------------------
 Building karaf-parent 1.5.0-SNAPSHOT
 ------------------------------------------------------------------------

 --- maven-dependency-plugin:2.10:sources (default-cli) @ karaf-parent ---

 The following files have been resolved:
    org.opendaylight.controller:karaf.branding:jar:sources:1.1.0-SNAPSHOT:compile
    org.opendaylight.controller:opendaylight-karaf-resources:jar:sources:1.5.0-SNAPSHOT:compile

 The following files have NOT been resolved:
    org.apache.karaf.features:standard:xml:sources:3.0.3:runtime

Again, the only thing I'm looking for is the package name and the files that have NOT been resolved.

I'm sure that there is some sed/awk command that I can run. But I just don't use regex enough to know the answer. :(

When I try to look it up, all I get is "remove blank line", which isn't really what I'm looking for.

Thanks in advance.

Upvotes: 1

Views: 156

Answers (4)

JJoao
JJoao

Reputation: 5357

perl -n0E 'say $1 while /NOT been resolved:\n(.*?\n)\n/gs`

Upvotes: 0

John1024
John1024

Reputation: 113984

how would I remove the line "The following files have been resolved:" and everything after it until the space?

I assume by space, you mean the space created by an empty line.

Using sed:

 $ sed '/The following files have been resolved/,/^$/d' file
------------------------------------------------------------------------
 Building karaf-parent 1.5.0-SNAPSHOT
 ------------------------------------------------------------------------

 --- maven-dependency-plugin:2.10:sources (default-cli) @ karaf-parent ---

 The following files have NOT been resolved:
    org.apache.karaf.features:standard:xml:sources:3.0.3:runtime

Using awk

$ awk '/The following files have been resolved/,/^$/{next;} 1' file
------------------------------------------------------------------------
 Building karaf-parent 1.5.0-SNAPSHOT
 ------------------------------------------------------------------------

 --- maven-dependency-plugin:2.10:sources (default-cli) @ karaf-parent ---

 The following files have NOT been resolved:
    org.apache.karaf.features:standard:xml:sources:3.0.3:runtime

Alternate Problem: keeping only the unresolved files

$ awk '/The following files have NOT been resolved/,/^$/' file
 The following files have NOT been resolved:
    org.apache.karaf.features:standard:xml:sources:3.0.3:runtime

Or, without the header:

$ awk ' /^$/{f=0} f{print} /The following files have NOT been resolved/{f=1}' file
    org.apache.karaf.features:standard:xml:sources:3.0.3:runtime

Revised Problem

From a pastebin sample log, none of the empty lines are actually empty. They all have at least one space. We can handle that with. With a POSIX sed, the following should work:

sed '/The following files have been resolved/,/^[[:space:]]*$/d' monitor.log

[:space:] is the unicode-safe way of specifying white space. If your sed does not support it, then use:

sed '/The following files have been resolved/,/^[ \t]*$/d' monitor.log

Further, in the unedited log, the lines of interest begin with [INFO]. The following will work whether or not the lines start with [INFO]:

sed '/The following files have been resolved/,/^\([[]INFO[]]\)\?[ \t\r]*$/d' monitor.log

For example, consider this sample (extracted from the pastebin source):

$ cat log2
[INFO] ------------------------------------------------------------------------
[INFO] Building yang-data-impl 0.7.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-dependency-plugin:2.10:sources (default-cli) @ yang-data-impl ---
[INFO] 
[INFO] The following files have been resolved:
[INFO]    org.opendaylight.yangtools:yang-binding:jar:sources:0.7.0-SNAPSHOT:compile
[INFO]    org.opendaylight.yangtools:yang-common:jar:sources:0.7.0-SNAPSHOT:compile
[INFO]    org.ow2.asm:asm:jar:sources:4.0:test
[INFO] 
[INFO] The following files have NOT been resolved:
[INFO]    antlr:antlr:jar:sources:2.7.7:test
[INFO] 

Our sed command works as follows:

$ sed '/The following files have been resolved/,/^\([[]INFO[]]\)\?[ \t\r]*$/d' log2
[INFO] ------------------------------------------------------------------------
[INFO] Building yang-data-impl 0.7.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-dependency-plugin:2.10:sources (default-cli) @ yang-data-impl ---
[INFO] 
[INFO] The following files have NOT been resolved:
[INFO]    antlr:antlr:jar:sources:2.7.7:test
[INFO] 

Upvotes: 1

WrathWolf
WrathWolf

Reputation: 45

Thanks to @John1024 I got on the right track.

However I found the answer to be the following:

sed '/The following files have been resolved/,/^[ \t]*$/d' file 

Upvotes: 0

user4401178
user4401178

Reputation:

sed 1,/"NOT been resolved:"/d file

This works if you are sure that the not resolved lines will be the last entry and no further text (otherwise you will need to grab only the proceeding paragraph). Its works by deleting all lines from line one up to the match.

Upvotes: 0

Related Questions