Reputation: 7469

print all but select fields in awk

I have a large file with hundreds of columns that I want to remove only the third and fourth columns from and print the rest to a file. My initial idea was to make an awk script like awk '{print $1, $2, for (i=$5; i <= NF; i++) print $i }' file > outfile. However, this code does not work.

I then tried:

awk '{for(i = 1; i<=NF; i++)
if(i == 3 || i == 4) continue
else
print($i)}' file > outfile

But this just printed everything out in one field. It would be possible to split this up into two scripts and combine them with unix paste but this seems like something that should be able to be done in one line.

Upvotes: 22

Answers (6)

NeronLeVelu

Reputation: 10039

The hard but generic way (to forget for a simple oneliner)

awk -v "Exclude=3:4:5" '
   # load exclusion
   BEGIN{
      Count=split(Exclude, aTmp, ":")
      for( i = 1; i <= Count; i++) aExc[ aTmp[ i]]=1
      }

   # treat each line, taking only wanted field
   {
    Result=""
    for( i = 1; i <= NF; i++) {
       # field to take ?
       if( ! aExc[ i]) {
         # first element or add a separator before
         if( Result != "") Result=Result OFS $i
          else Result=$i
         }
       }

    print Result
   }' YourFile

you can specify any field that you want to exclude
- fill field index in varaible Exclude separate by a : in first line
separator are correct in place an quantity
code is "expanded" for better understanding
the final result is not exactly as input (without excluded field) because the output separator is used instead of original separator (ex 2 space or a tab is changed to 1 space with default behaviour)

Upvotes: 0

progz

Reputation: 327

Yes, it's possible to just set the third and fourth columns to an empty string; but, in addition, field $1 should be set to itself ($1=$1) to make awk actually consume the input field separator (delimeter) : on the entire current line $0 in one go.

echo 1:2:3:4:5:6:7:8:9:10 | awk -F: '{ $1=$1; $3=""; $4=""; print $0}'

Upvotes: 4

thomascirca

Reputation: 933

What about something like:

cat SOURCEFILE | cut -f1-2,5- >> DESTFILE

It prints the first two columns, skips the 3rd and 4rth, and then prints from 5 onwards to the end.

Upvotes: 18

matchew

Reputation: 19665

Say you have a tab delimited file that looks like the following:

temp.txt

field1 field2 field3 field4 field5 field6
field1 field2 field3 field4 field5 field6
field1 field2 field3 field4 field5 field6

running the following will remove field 3 and 4 and output to end of line.

awk '{print $1"\t"$2"\t"substr($0, index($0,$5))}' temp.txt

field1 field2 field5 field6
field1 field2 field5 field6
field1 field2 field5 field6

My example(s) print to stdout. > newFile will send stdout to newFile and >> newFile will append to newFile.

So you may want to use the following:

awk '{print $1"\t"$2"\t"substr($0, index($0,$5))}' temp.txt > newFile.txt

some will argue for cut

cut -f1,2,5- temp.txt

which produce the same output, and cut is great for simplicity, but does not handle inconsistent delimiters. For example mixture of different whitespaces. However, in this case cut may be what you are after.

you could also accomplish this in perl,python,ruby,and many others, but here is the simplest awk solution.

Upvotes: 7

jim

Reputation: 121

How about just setting the third and fourth columns to an empty string:

echo 1 2 3 4 5 6 7 8 9 10 |
awk -F" " '{ $3="";  $4=""; print}'

Upvotes: 12

Carl Norum

Reputation: 225112

Your first try was pretty close. Modifying it to use printf and including the field separators worked for me:

awk '{printf $1FS$2; for (i=5; i <= NF; i++) printf FS$i; print NL }'

Upvotes: 20

print all but select fields in awk

Answers (6)

Related Questions