How to grep two column from a single file

Question

cat Error00

4  0    375
4 2001   21
4 2002   20

cat Error01

4 0      465
4 2001   12
4 2002   40
4 2016   1

I want output as below

4 0      375   465
4 2001   21    12
4 2002   20    20
4 2016   -     1

i am using the below query. here problem is i m not able to handle grep for two field because space is coming. please suggest how can to get rid of this.

keylist=$(awk '{print $1,$2'} Error0[0-1] | sort | uniq)
for key in ${keylist} ; do
echo ${key}
        val_a=$(grep "^${key}" Error00 | awk  '{print $3}') ;val_a=${val_a:---}
        val_b=$(grep "^${key}" Error01 | awk '{print $1,$2}') ; val_b=${val_b:---    --}
        echo $key  ${val_a} >>testreport
done

i m geting the oputput as below

4       375   465
0
4       21    12
2001
4       20    20
2002
4       -     1
2016

Jonathan Leffler · Accepted Answer

grep is not really the right tool for this job. You can either play with awk or Perl (or Python, or …), or you can use join. However, join only joins on a single column at a time, and you appear to need to join on two columns. So, we're going to have to massage the data so that it will work with join. I'm about to assume you're using bash and so have process substitution available. You can do the job without, but it is fiddlier and involves temporary files (and traps to clean them up, etc).

The key to the join will be to replace the blank between the first two columns with a colon (or any other convenient character — control-A would work fine too), then join the files on column 1 with a replacement character. The inputs must be sorted; the output must have the colon replaced with a blank.

$ join -o 0,1.2,2.2 -a 1 -a 2 -e '-' \
>     <(sed 's/  */:/' Error00 | sort) \
>     <(sed 's/  */:/' Error01 | sort) |
> sed 's/:/ /'
4 0 375 465
4 2001 21 12
4 2002 20 40
4 2016 - 1
$

The 's/ */:/' operation replaces the first sequence of one or more blanks with a colon; the input data has two blanks between the 4 and the 0 in the first line of Error00. The input to join must be in sorted order of the joining field, here the first field. The output is the join field, the second column of Error00 and the second column of Error01 (remembering that means the second column after the first two have been fused by the colon). If there's an unmatched line in the first file, generate an output line (-a 1); ditto for the second file; and for the missing fields, insert a dash (-e '-'). The final sed removes the colon that was added.

If you want the data formatted, pipe it through awk.

$ join -o 0,1.2,2.2 -a 1 -a 2 -e '-' \
>     <(sed 's/  */:/' Error00 | sort) \
>     <(sed 's/  */:/' Error01 | sort) |
> sed 's/:/ /' |
> awk '{printf("%s %-6s %-6s %s
", $1, $2, $3, $4)}'
4 0      375    465
4 2001   21     12
4 2002   20     40
4 2016   -      1
$

How to grep two column from a single file

Answers (2)

Related Questions