Rajeev
Rajeev

Reputation: 46909

eliminate unwanted output using awk and sed

From the following command how can i eliminate all the lines that occur before

 Owner     RepoName             CreatedDate

EDIT Command:

find /opt/site/ -name '.log.txt' | xargs cat | awk '{$NF=""; print $0}' | sed '1i Owner RepoName CreatedDate' | column -t

The output is

find: Filesystem loop detected; `/nfs/.snapshot/nightly.4' has the same device number and inode as a directory which is 2 levels higher in the filesystem hierarchy.
find: Filesystem loop detected; `/nfs/.snapshot/nightly.5' has the same device number and inode as a directory which is 2 levels higher in the filesystem hierarchy.
find: Filesystem loop detected; `/nfs/.snapshot/nightly.6' has the same device number and inode as a directory which is 2 levels higher in the filesystem hierarchy.
Owner     RepoName             CreatedDate
val        abc                  Fri          Mar  16  17:01:07  PDT
p1         repo_pc              Wed          Mar  21  11:34:42  PDT
New        fm                   Mon          Mar  19  00:15:51  PD 

Required output is only:

Owner     RepoName             CreatedDate
val        abc                  Fri          Mar  16  17:01:07  PDT
p1         repo_pc              Wed          Mar  21  11:34:42  PDT
New        fm                   Mon          Mar  19  00:15:51  PD 

Upvotes: 1

Views: 3686

Answers (5)

S0AndS0
S0AndS0

Reputation: 920

This is totally doable with Awk scripting...

#!/usr/bin/awk -f

BEGIN {
  for (i = 1; i < ARGC; i++) {
    if (ARGV[i] ~ "^--from=") {
      _from = substr(ARGV[i], 8)
      delete ARGV[i]
    }
  }

  if (!_from) {
    print "No '--from' argument provided!" > "/dev/stderr"
  }
}


{

  if (_flag) {
    print $0
  } else if ($0 ~ _from) {
    _flag = 1
    print $0
  }

}

Note; above script was adapted (trimmed down) from from-till.awk which'll print between --from and --till search expressions, so added command-line options and variable names may need adjusted for this specific use case.

... which allows for the use of files as inputs...

head-trimmer.awk --from="^Owner" file-path.txt

... or re-directions such as EOF or pipes...

head-trimmer.awk --from="^Owner" <<'EOF'
find: Filesystem loop detected; `/nfs/.snapshot/nightly.4' has the same device number and inode as a directory which is 2 levels higher in the filesystem hierarchy.
find: Filesystem loop detected; `/nfs/.snapshot/nightly.5' has the same device number and inode as a directory which is 2 levels higher in the filesystem hierarchy.
find: Filesystem loop detected; `/nfs/.snapshot/nightly.6' has the same device number and inode as a directory which is 2 levels higher in the filesystem hierarchy.
Owner     RepoName             CreatedDate
val        abc                  Fri          Mar  16  17:01:07  PDT
p1         repo_pc              Wed          Mar  21  11:34:42  PDT
New        fm                   Mon          Mar  19  00:15:51  PD
EOF

... and should parse things to something like...

Owner     RepoName             CreatedDate
val        abc                  Fri          Mar  16  17:01:07  PDT
p1         repo_pc              Wed          Mar  21  11:34:42  PDT
New        fm                   Mon          Mar  19  00:15:51  PD

... Awk scripting enables easier expansion and/or adaptation for other use cases, and using it properly means that one can eliminate unnecessary calls to other programs.

It should be possible to eliminate sed and column from your pipe-line with a few more hints


The BEGIN and END blocks with Awk run at the beginning and end of all inputs, eg. a list of files, so is really good for building the header and column mappings


Using while with getline within Awk allows for parsing output of commands...

#!/usr/bin/awk -f

BEGIN {
  for (i = 1; i < ARGC; i++) {
    if (ARGV[i] ~ "^--directory=") {
      _directory = substr(ARGV[i], 13)
      delete ARGV[i]
    }
    if (ARGV[i] ~ "^--name=") {
      _name = substr(ARGV[i], 8)
      delete ARGV[i]
    }
    # ... perhaps add other args to parse
  }

  # ... build/print header maybe

}


{

  cmd = "find " _directory " -name " _name " 2>/dev/null"
  while (( cmd | getline _line ) > 0) {
    print "_line ->", _line
    # ... do some fancy formatting, use a built-in, or another command
    #     to build desired column output from find results
  }
  close(cmd)

  # ...

}

This can be supper handy when tempted to write a Bash script that's just a wrapper of a command with some custom parsing.


There are quite a few handy built-in Awk functions (more so with GAwk), eg. split, length, and it's possible to add more via the function key-word within an Awk script.


Arrays/Dictionary variables are also possible with Awk, eg...

BEGIN {
  for (i = 1; i < ARGC; i++) {
    if (ARGV[i] ~ "^--from=") {
      _custom_args["from"] = substr(ARGV[i], 8)
      delete ARGV[i]
    } else if (ARGV[i] ~ "^--till=") {
      _custom_args["till"] = substr(ARGV[i], 8)
      delete ARGV[i]
    }
  }
}


{
  # ...
}

But (if I remember correctly), multidimensional arrays such as _something[0,1] should be avoided, because in Awk such things are really _something["0,1"]


Printing columns as nicely formatted tables with Awk is a bit tricky but also doable via printf formatting options...

#!/usr/bin/awk -f

BEGIN {
  printf("%-8s %-13s %s\n", "Owner", "RepoName", "CreatedDate")
}

Essentially the %-8s is telling Awk to reserve at least 8 characters of space regardless of string length of "Owner", %-13s reserves 13, and - tells Awk to pad longer stings with a separator on the right/end of string.


To forbid longer stings printf in combination with %.<n> may be of use...

#!/usr/bin/awk -f

BEGIN {
  printf("%.3s %-13s %s\n", "Owner", "RepoName", "CreatedDate")
}

If ya get stuck feel free to comment and I'll try to swing-by again with more tips.

Upvotes: 0

Douglas Leeder
Douglas Leeder

Reputation: 53310

Those find errors will be on stderr, so bypass your chain entirely, you'll want to redirect the errors with 2>/dev/null, although that will prevent you seeing any other errors in the find command.

find /opt/site/ -name '.log.txt' 2>/dev/null | xargs cat | awk '{$NF=""; print $0}' | xargs sed "/Filesystem/d" | sed '1i Owner RepoName CreatedDate' | column -t

In general with a complicated command like this, you should break it down when you have errors so that you can work out where the problem is coming from.

Let's split up this command to see what it's doing:

find /opt/site/ -name '.log.txt' 2>/dev/null - find all the files under /opt/site/ named .log.txt

xargs cat - get all their contents, one after the other

awk '{$NF=""; print $0}' - delete the last column

xargs sed "/Filesystem/d" - Treat each entry as a file and delete any lines containing Filesystem from the contents of those files.

sed '1i Owner RepoName CreatedDate' - Insert Owner RepoName CreatedDate on the first line

column -t - Convert the given data into a table

I'd suggest building up the command, and checking the output is correct at each stage.

Several things are surprising about your command:

  1. The find one looks for files that are exactly .log.txt rather than an extension.
  2. The second xargs call - converting the contents of the .log.txt files into filenames.

Upvotes: 3

Birei
Birei

Reputation: 36262

Next sed command should do the job (use it with an input file or from a pipe):

sed -n '/^Owner/,$ p'

Explanation:

-n             # Disable auto-print.
/^Owner/       # From a line beginning with 'Owner'...
$              # ...until end of input...
p              # print

Upvotes: 0

Rob Davis
Rob Davis

Reputation: 15772

Sadly, you seem to be using csh or tcsh, where redirecting standard error separate from standard output is difficult. Otherwise Douglas's answer would have worked. But try this:

(find /opt/site/ -name '.log.txt' | xargs cat | awk '{$NF=""; print $0}' | sed '1i Owner RepoName CreatedDate' | column -t > output) >&/dev/null

Note the parens surrounding the bulk of the command. Within those parens is a redirect to send standard output to a file called "output" instead of to your terminal (name it whatever you want -- or replace output with /dev/tty if you really want to see it in your terminal). Outside those parens is a redirect to send the remaining error messages to /dev/null.

The whole thing is a miserable commentary on the longevity of terrible shells.

Upvotes: 0

Chilledrat
Chilledrat

Reputation: 2605

You could eliminate the error output from find by appending 2>/dev/null to the find portion of you command, prior to the first pipe. [Edit: this is the best way, and I've voted Douglas' up as he was here first ;) ]

But if you really want to do it with sed or awk (can't think why?), you could amend your awk script to skip lines beginning with 'find:':

awk '/^find:/ {next;} {$NF=""; print $0}' 

Upvotes: 0

Related Questions