Dibakar Debnath
Dibakar Debnath

Reputation: 37

How to grep two column from a single file

cat Error00

4  0    375
4 2001   21
4 2002   20

cat Error01

4 0      465
4 2001   12
4 2002   40
4 2016   1

I want output as below

4 0      375   465
4 2001   21    12
4 2002   20    20
4 2016   -     1

i am using the below query. here problem is i m not able to handle grep for two field because space is coming. please suggest how can to get rid of this.

keylist=$(awk '{print $1,$2'} Error0[0-1] | sort | uniq)
for key in ${keylist} ; do
echo ${key}
        val_a=$(grep "^${key}" Error00 | awk  '{print $3}') ;val_a=${val_a:---}
        val_b=$(grep "^${key}" Error01 | awk '{print $1,$2}') ; val_b=${val_b:---    --}
        echo $key  ${val_a} >>testreport
done

i m geting the oputput as below

4       375   465
0
4       21    12
2001
4       20    20
2002
4       -     1
2016

Upvotes: 2

Views: 5493

Answers (2)

Chris Seymour
Chris Seymour

Reputation: 85775

A single awk one liner can handle this easily:

awk 'FNR==NR{a[$1,$2]=$3;next}{print $1,$2,(a[$1,$2]?a[$1,$2]:"-"),$3}' err0 err1
4 0 375 465
4 2001 21 12
4 2002 20 40
4 2016 - 1

For formatted output you can use printf instead of print. Like Jonathan Leffler suggest:

printf "%s %-6s %-6s %s\n",$1,$2,(a[$1,$2]?a[$1,$2]:"-"),$3
4 0      375    465
4 2001   21     12
4 2002   20     40
4 2016   -      1

However a general solution is to use column -t for a nice table output:

awk '{....}' err0 err1 | column -t
4  0     375  465
4  2001  21   12
4  2002  20   40
4  2016  -    1

Upvotes: 4

Jonathan Leffler
Jonathan Leffler

Reputation: 753525

grep is not really the right tool for this job. You can either play with awk or Perl (or Python, or …), or you can use join. However, join only joins on a single column at a time, and you appear to need to join on two columns. So, we're going to have to massage the data so that it will work with join. I'm about to assume you're using bash and so have process substitution available. You can do the job without, but it is fiddlier and involves temporary files (and traps to clean them up, etc).

The key to the join will be to replace the blank between the first two columns with a colon (or any other convenient character — control-A would work fine too), then join the files on column 1 with a replacement character. The inputs must be sorted; the output must have the colon replaced with a blank.

$ join -o 0,1.2,2.2 -a 1 -a 2 -e '-' \
>     <(sed 's/  */:/' Error00 | sort) \
>     <(sed 's/  */:/' Error01 | sort) |
> sed 's/:/ /'
4 0 375 465
4 2001 21 12
4 2002 20 40
4 2016 - 1
$

The 's/ */:/' operation replaces the first sequence of one or more blanks with a colon; the input data has two blanks between the 4 and the 0 in the first line of Error00. The input to join must be in sorted order of the joining field, here the first field. The output is the join field, the second column of Error00 and the second column of Error01 (remembering that means the second column after the first two have been fused by the colon). If there's an unmatched line in the first file, generate an output line (-a 1); ditto for the second file; and for the missing fields, insert a dash (-e '-'). The final sed removes the colon that was added.

If you want the data formatted, pipe it through awk.

$ join -o 0,1.2,2.2 -a 1 -a 2 -e '-' \
>     <(sed 's/  */:/' Error00 | sort) \
>     <(sed 's/  */:/' Error01 | sort) |
> sed 's/:/ /' |
> awk '{printf("%s %-6s %-6s %s\n", $1, $2, $3, $4)}'
4 0      375    465
4 2001   21     12
4 2002   20     40
4 2016   -      1
$

Upvotes: 1

Related Questions