Reputation: 19
Hi this might be a basic question for many, but it has however managed to eat a couple of hours of my time.
I have large data file as an output from running a script. The file contains around 15 columns and around 100,000 rows. I wish to search through the file and in columns 4,5,6,7 and 8 check for specific values( and strings ). I know I can cut the columns separately and view them or use forward search("/") in less command. The problem here is the second and third column will also contain the value (almost in every other line) I search for. I only need the values in columns 4,5,6,7 and 8 for result interpretation and also I need to view adjacent columns too. How can I accomplish this? I do not want to use any external languages such as R, python or perl, I am looking for solutions using command line commands.
i use the following command to view the file;
bzcat myfile.tsv.bz2 | column -t | less -S
Any inputs will be appreciated.
Example of how the data looks like; (It is biological data within specific intervals)
col1 strt end Sample1 Sample2 Sample3 Sample4 Sample5 p.val1 p.val2 . ID
ABC 1100 1200 2 2 2 2 3 NA 0.27403 PLD4
BCD 1200 1300 4 3 4 4 2 0.88831 0.37662 CYP46A1
CDE 1300 1400 2 1 4 2 1 0.77922 0.00519 CEBPE
DEF 1400 1500 6 4 4 4 4 0.88182 NA BRCA
EFG 1500 1600 2 6 8 10 3 0.00779 0.01558 BRCA
Say I want to view the file on whole and restrict my only to search columns 4,5,6,7 and 8. ~M
Upvotes: 0
Views: 825
Reputation: 204558
Until you edit your question to provide more info, is this what you want?:
$ awk '$4==1 && $6==4' file
BCD 2 4 1 1 4 2
The above was run against your posted sample input file:
$ cat file
col1 srt end col4 col5 col6 col7
ABC 1 2 1 1 5 2
BCD 2 4 1 1 4 2
CDE 4 6 6 5 2 5
DEF 6 8 4 4 4 4
EFG 8 10 4 4 3 4
Given your comment below, is this what you want:
$ awk '{print $0 ($4==1 && $6==4 ? " <--- HERE I AM!" : "")}' file
col1 srt end col4 col5 col6 col7
ABC 1 2 1 1 5 2
BCD 2 4 1 1 4 2 <--- HERE I AM!
CDE 4 6 6 5 2 5
DEF 6 8 4 4 4 4
EFG 8 10 4 4 3 4
Upvotes: 1
Reputation: 53508
OK, so I'm going to assume that tsv
means tab-separated values.
I would use perl for this:
#!/usr/bin/perl
use strict;
use warnings;
my $search_term = "some_term";
my @columns_to_check = ( 4,5,6,7,8 );
while ( <> ) {
my @cols = split;
for my $colnum ( @columns_to_check ) {
if ( $cols[$colnum] =~ m/$search_term/ ) {
print;
last;
}
}
}
Note: $search_term
is a regular expression match.
Also: Perl starts arrays at zero, so your column 1 might be column 0.
Upvotes: 0