Vonton
Vonton

Reputation: 3362

Remove empty columns by awk

I have a input file, which is tab delimited, but I want to remove all empty columns. Empty columns : $13=$14=$15=$84=$85=$86=$87=$88=$89=$91=$94

INPUT: tsv file with more than 90 columns

a b   d e   g...  
a b   d e   g...

OUTPUT: tsv file without empty columns

a b d e g....
a b d e g...

Thank you

Upvotes: 2

Views: 4462

Answers (2)

kvantour
kvantour

Reputation: 26481

remove ALL empty columns:

If you have a tab-delimited file, with empty columns and you want to remove all empty columns, it implies that you have multiple consecutive tabs. Hence you could just replace those with a single tab and delete then the first starting tab if you also removed the first column:

sed 's/\t\+/\t/g;s/^\t//' <file>

remove SOME columns: See Ed Morton or just use cut:

cut --complement -f 13,14,15,84,85,86,87,88,89,91,94 <file>

remove selected columns if and only if they are empty:

Basically a simple adaptation from Ed Morton :

awk 'BEGIN{FS=OFS="\t"; n=split(col,a,",")}
     { for(i=1;i<=n;++i) if ($a[i]=="") $a[i]=RS; gsub("(^|"FS")"RS,"") }
     1' col=13,14,15,84,85,86,87,88,89,91,94 <file>

Upvotes: 3

Ed Morton
Ed Morton

Reputation: 203607

This might be what you want:

$ printf 'a\tb\tc\td\te\n'
a       b       c       d       e

$ printf 'a\tb\tc\td\te\n' | awk 'BEGIN{FS=OFS="\t"} {$2=$4=""} 1'
a               c               e

$ printf 'a\tb\tc\td\te\n' | awk 'BEGIN{FS=OFS="\t"} {$2=$4=RS; gsub("(^|"FS")"RS,"")} 1'
a       c       e

Note that the above doesn't remove all empty columns as some potential solutions might do, it only removes exactly the column numbers you want removed:

$ printf 'a\tb\t\td\te\n'
a       b               d       e

$ printf 'a\tb\t\td\te\n' | awk 'BEGIN{FS=OFS="\t"} {$2=$4=RS; gsub("(^|"FS")"RS,"")} 1'
a               e

Upvotes: 6

Related Questions