Reputation: 67
Let's say I have a data file containing the following:
1 2 3 4 5
67 88 12 32 22
9 99 34 59 86
17 0 78 0 77
11 0 0 0 43
I would like to have a code that searches through each column for the number 0. If the number 0 is found, the code will print out that entire column in a separate file.
With this data, the outputted file would look like so:
2 3 4
88 12 32
99 34 59
0 78 0
0 0 0
It'd be great if the code didn't require me knowing the exact number of columns and/or row.
Upvotes: 2
Views: 479
Reputation: 47099
Here is an interesting way of doing it with GNU awk:
parse.awk
# Record number of columns (assuming all columns have the same number of fields)
NR == 1 { n = NF }
# First parse: Remember which columns contain `pat`
FNR == NR {
for(i=1; i<=NF; i++)
if($i == pat) {
h[i] = i
last = i>last ? i : last
}
next
}
# Before second parse: switch to reading one field at a time
ENDFILE {
RS="[ \t\n]+"
}
# Second parse: print field if current-record-number modulo
# number-of-columns is in the `h` hash
{ m = FNR % n }
m in h {
ORS = (m == last) ? "\n" : OFS # print new-line after last column
print $1
}
Run it like this for example:
awk -f parse.awk pat=0 infile infile
Output:
2 3 4
88 12 32
99 34 59
0 78 0
0 0 0
Or with OFS='\t'
:
awk -f parse.awk pat=0 OFS='\t' infile infile
Output:
2 3 4
88 12 32
99 34 59
0 78 0
0 0 0
Upvotes: 0
Reputation: 10039
sed '#n
# init and load line in buffer (1st line copied, other added)
s/.*/>& /;1!H;1h
# at end of file, load buffer in working area
$ {x
:cycle
# keep column if zero inside
/>[[:blank:]]*0[[:blank:]]/ s/>\(\([[:blank:]]*[0-9]\{1,\}\)[[:blank:]][[:graph:][:blank:]]*\)/\2>\1/g
# remove treated column
s/>[[:blank:]]*[0-9]\{1,\}\([[:blank:]]\{1,\}[[:graph:][:blank:]]*\)/>\1/g
# is there another colum to treat ?
/>[[:blank:]]*[0-9][[:graph:][:blank:]]/ b cycle
# print result after cleanup
s/>//gp
}' YourFile
--posix
on GNU sedUpvotes: 1
Reputation: 113834
This will do what you want. It does not requiring knowing anything about how many rows or columns are present.
$ awk 'FNR==NR{for (i=1;i<=NF;i++)if ($i==o)a[i]=1;next} {tab="";for (i=1;i<=NF;i++)if (a[i]){printf "%s%s",tab,$i; tab="\t"};print ""}' file file
2 3 4
88 12 32
99 34 59
0 78 0
0 0 0
Because the file name is specified twice on the command line, the awk
script will read the file twice, the first time to look for zeros, the second time to print.
FNR==NR{for (i=1;i<=NF;i++)if ($i==o)a[i]=1;next}
One the first run through the file, a[i]
is set to one for any column i
that has a zero in it.
This code only applies to the first run through because of the condition FNR==NR
. NR
is the total number of records (lines) that we have read so far. FNR
is the number of records (lines) that we have read so far from the current file. Thus, when FNR==NR
, we are still reading the first file. The next
at the end of the commands tells awk
to skip the remaining commands and start over on the next line.
tab="";for (i=1;i<=NF;i++)if (a[i]){printf "%s%s",tab,$i; tab="\t"};print ""
When we are reading through the file for the second time, we print out each column i
for which a[i]
is non-zero. I chose tab-separated output but, by simply adjusting the printf
statement, any format could be used.
Upvotes: 2