Stedy
Stedy

Reputation: 7469

print all but select fields in awk

I have a large file with hundreds of columns that I want to remove only the third and fourth columns from and print the rest to a file. My initial idea was to make an awk script like awk '{print $1, $2, for (i=$5; i <= NF; i++) print $i }' file > outfile. However, this code does not work.

I then tried:

awk '{for(i = 1; i<=NF; i++)
if(i == 3 || i == 4) continue
else
print($i)}' file > outfile

But this just printed everything out in one field. It would be possible to split this up into two scripts and combine them with unix paste but this seems like something that should be able to be done in one line.

Upvotes: 22

Views: 38565

Answers (6)

NeronLeVelu
NeronLeVelu

Reputation: 10039

The hard but generic way (to forget for a simple oneliner)

awk -v "Exclude=3:4:5" '
   # load exclusion
   BEGIN{
      Count=split(Exclude, aTmp, ":")
      for( i = 1; i <= Count; i++) aExc[ aTmp[ i]]=1
      }

   # treat each line, taking only wanted field
   {
    Result=""
    for( i = 1; i <= NF; i++) {
       # field to take ?
       if( ! aExc[ i]) {
         # first element or add a separator before
         if( Result != "") Result=Result OFS $i
          else Result=$i
         }
       }

    print Result
   }' YourFile
  • you can specify any field that you want to exclude
    • fill field index in varaible Exclude separate by a : in first line
  • separator are correct in place an quantity
  • code is "expanded" for better understanding
  • the final result is not exactly as input (without excluded field) because the output separator is used instead of original separator (ex 2 space or a tab is changed to 1 space with default behaviour)

Upvotes: 0

progz
progz

Reputation: 327

Yes, it's possible to just set the third and fourth columns to an empty string; but, in addition, field $1 should be set to itself ($1=$1) to make awk actually consume the input field separator (delimeter) : on the entire current line $0 in one go.

echo 1:2:3:4:5:6:7:8:9:10 | awk -F: '{ $1=$1; $3=""; $4=""; print $0}'

Upvotes: 4

thomascirca
thomascirca

Reputation: 933

What about something like:

cat SOURCEFILE | cut -f1-2,5- >> DESTFILE

It prints the first two columns, skips the 3rd and 4rth, and then prints from 5 onwards to the end.

Upvotes: 18

matchew
matchew

Reputation: 19665

Say you have a tab delimited file that looks like the following:

temp.txt

field1 field2 field3 field4 field5 field6
field1 field2 field3 field4 field5 field6
field1 field2 field3 field4 field5 field6

running the following will remove field 3 and 4 and output to end of line.

awk '{print $1"\t"$2"\t"substr($0, index($0,$5))}' temp.txt

field1 field2 field5 field6
field1 field2 field5 field6
field1 field2 field5 field6

My example(s) print to stdout. > newFile will send stdout to newFile and >> newFile will append to newFile.

So you may want to use the following:

awk '{print $1"\t"$2"\t"substr($0, index($0,$5))}' temp.txt > newFile.txt

some will argue for cut

cut -f1,2,5- temp.txt

which produce the same output, and cut is great for simplicity, but does not handle inconsistent delimiters. For example mixture of different whitespaces. However, in this case cut may be what you are after.

you could also accomplish this in perl,python,ruby,and many others, but here is the simplest awk solution.

Upvotes: 7

jim
jim

Reputation: 121

How about just setting the third and fourth columns to an empty string:

echo 1 2 3 4 5 6 7 8 9 10 |
awk -F" " '{ $3="";  $4=""; print}'

Upvotes: 12

Carl Norum
Carl Norum

Reputation: 225112

Your first try was pretty close. Modifying it to use printf and including the field separators worked for me:

awk '{printf $1FS$2; for (i=5; i <= NF; i++) printf FS$i; print NL }'

Upvotes: 20

Related Questions