Brad D
Brad D

Reputation: 67

How to print columns containing value

Let's say I have a data file containing the following:

 1     2     3     4     5
67    88    12    32    22
 9    99    34    59    86
17     0    78     0    77
11     0     0     0    43

I would like to have a code that searches through each column for the number 0. If the number 0 is found, the code will print out that entire column in a separate file.

With this data, the outputted file would look like so:

 2     3     4
88    12    32
99    34    59
 0    78     0
 0     0     0     

It'd be great if the code didn't require me knowing the exact number of columns and/or row.

Upvotes: 2

Views: 479

Answers (3)

Thor
Thor

Reputation: 47099

Here is an interesting way of doing it with GNU awk:

parse.awk

# Record number of columns (assuming all columns have the same number of fields)
NR == 1 { n = NF } 

# First parse: Remember which columns contain `pat`
FNR == NR { 
  for(i=1; i<=NF; i++) 
    if($i == pat) {
      h[i] = i
      last = i>last ? i : last
    }
  next
} 

# Before second parse: switch to reading one field at a time
ENDFILE { 
  RS="[ \t\n]+"
} 

# Second parse: print field if current-record-number modulo 
#               number-of-columns is in the `h` hash
{ m = FNR % n }

m in h {
  ORS = (m == last) ? "\n" : OFS  # print new-line after last column
  print $1
}

Run it like this for example:

awk -f parse.awk pat=0 infile infile

Output:

2 3 4
88 12 32
99 34 59
0 78 0
0 0 0

Or with OFS='\t':

awk -f parse.awk pat=0 OFS='\t' infile infile

Output:

2   3   4
88  12  32
99  34  59
0   78  0
0   0   0

Upvotes: 0

NeronLeVelu
NeronLeVelu

Reputation: 10039

sed '#n
# init and load line in buffer (1st line copied, other added)
s/.*/>& /;1!H;1h

# at end of file, load buffer in working area
$ {x
:cycle
# keep column if zero inside
   />[[:blank:]]*0[[:blank:]]/ s/>\(\([[:blank:]]*[0-9]\{1,\}\)[[:blank:]][[:graph:][:blank:]]*\)/\2>\1/g
# remove treated column
   s/>[[:blank:]]*[0-9]\{1,\}\([[:blank:]]\{1,\}[[:graph:][:blank:]]*\)/>\1/g
# is there another colum to treat ?
   />[[:blank:]]*[0-9][[:graph:][:blank:]]/ b cycle

# print result after cleanup
   s/>//gp
   }' YourFile
  • Self commented sed
  • posix versioj so --posix on GNU sed

Upvotes: 1

John1024
John1024

Reputation: 113834

This will do what you want. It does not requiring knowing anything about how many rows or columns are present.

$ awk 'FNR==NR{for (i=1;i<=NF;i++)if ($i==o)a[i]=1;next} {tab="";for (i=1;i<=NF;i++)if (a[i]){printf "%s%s",tab,$i; tab="\t"};print ""}' file file
2       3       4
88      12      32
99      34      59
0       78      0
0       0       0

How it works

Because the file name is specified twice on the command line, the awk script will read the file twice, the first time to look for zeros, the second time to print.

  • FNR==NR{for (i=1;i<=NF;i++)if ($i==o)a[i]=1;next}

    One the first run through the file, a[i] is set to one for any column i that has a zero in it.

    This code only applies to the first run through because of the condition FNR==NR. NR is the total number of records (lines) that we have read so far. FNR is the number of records (lines) that we have read so far from the current file. Thus, when FNR==NR, we are still reading the first file. The next at the end of the commands tells awk to skip the remaining commands and start over on the next line.

  • tab="";for (i=1;i<=NF;i++)if (a[i]){printf "%s%s",tab,$i; tab="\t"};print ""

    When we are reading through the file for the second time, we print out each column i for which a[i] is non-zero. I chose tab-separated output but, by simply adjusting the printf statement, any format could be used.

Upvotes: 2

Related Questions