Aleix
Aleix

Reputation: 127

compare two file and print the lines that have matching columns awk

I need to print the lines of one document if they match with the first column of the second file, using AWK.

FIRST FILE (comprobacio.txt):

2187405XJ4228N0001RX
42379999999997GH0002 
517878G4RSD407yJK4NY
4237405HHYT4323H0002
517P0P0P06GH9001233F
517878G4R67TRRHOPPNY
423123R66677789323H2

SECOND FILE (datos.txt):

2187405XJ4228N0001RX@1984@216@230 08m 06s N, 82o 21m 34s W 
4237405XJK4N37GH0002@2010@54@400 02m Ols N, 80o 20m 12s W 
517878G4RSO405XJK4NY@1954@103@400 42m 51s N, 74o 06m 21s E 
4237405HHYT4323H0002@2006@55@300 04m Ols N, 810 20m 12s W 
517POLIJ56GH9001233F@2010@803@400 52m 52s N, 74o 06m 70s E 
517878G4R67TRRHOPPNY@1954@108@400 42m 51s N, 74o 05m 21s E 
4237405899544T4323H2@2000@5778@390 12m 07s N, 900 10m 12s W 

OUTPUT EXPECTED

2187405XJ4228N0001RX@1984@216@230 08m 06s N, 82o 21m 34s W 
4237405HHYT4323H0002@2006@55@300 04m Ols N, 810 20m 12s W
517878G4R67TRRHOPPNY@1954@108@400 42m 51s N, 74o 05m 21s E

I have tried first editing the second files with sed to eliminate the '@' character and replace it with a space ' '. And pipe it after it with AWK to come up with the lines that have the same first column but it doesn't output anything.

sed 's/@/ /g' datos.txt | awk 'FNR==NR{array[$1];next} $1 in array {print $0}' datos.txt comprobacio.txt

Any idea of what I'm getting wrong?

Upvotes: 2

Views: 1232

Answers (3)

dawg
dawg

Reputation: 103744

You can use join on sorted files in this case:

join -1 1 -2 1 -t @ <(sort file1) <(sort file2) 
2187405XJ4228N0001RX@1984@216@230 08m 06s N, 82o 21m 34s W 
4237405HHYT4323H0002@2006@55@300 04m Ols N, 810 20m 12s W 
517878G4R67TRRHOPPNY@1954@108@400 42m 51s N, 74o 05m 21s E 

Upvotes: 1

L&#233;a Gris
L&#233;a Gris

Reputation: 19545

Use grep like this:

grep -Ff comprobacio.txt datos.txt

grep options used:

   -F, --fixed-strings
          Interpret PATTERNS as fixed strings, not regular expressions.

   -f FILE, --file=FILE
          Obtain patterns from FILE, one per line.  If this option is used
          multiple  times  or  is  combined with the -e (--regexp) option,
          search for all patterns given.  The  empty  file  contains  zero
          patterns, and therefore matches nothing.

Upvotes: 0

user000001
user000001

Reputation: 33317

Try like this:

awk -F '@' 'NR==FNR{a[$0];next} $1 in a' comprobacio.txt datos.txt 
2187405XJ4228N0001RX@1984@216@230 08m 06s N, 82o 21m 34s W 
4237405HHYT4323H0002@2006@55@300 04m Ols N, 810 20m 12s W 
517878G4R67TRRHOPPNY@1954@108@400 42m 51s N, 74o 05m 21s E

We set if field separator FS to the @ symbol with -F '@'.

But the problem in your code is that you replaced are reading from STDIN and from a file at the same time. To do this in awk you would need to set the filename as - to denote the STDIN as below:

sed 's/@/ /g' datos.txt | awk 'FNR==NR{array[$1];next} $1 in array {print $0}' comprobacio.txt -
2187405XJ4228N0001RX 1984 216 230 08m 06s N, 82o 21m 34s W 
4237405HHYT4323H0002 2006 55 300 04m Ols N, 810 20m 12s W 
517878G4R67TRRHOPPNY 1954 108 400 42m 51s N, 74o 05m 21s E 

Note the trailing minus symbol (-).

Another option would be to use process substitution, something like:

awk 'FNR==NR{array[$1];next} $1 in array {print $0}' comprobacio.txt  <(sed 's/@/ /g' datos.txt)
2187405XJ4228N0001RX 1984 216 230 08m 06s N, 82o 21m 34s W 
4237405HHYT4323H0002 2006 55 300 04m Ols N, 810 20m 12s W 
517878G4R67TRRHOPPNY 1954 108 400 42m 51s N, 74o 05m 21s E 

Note that the @ symbols in the output are replaced with spaces in this case.

Upvotes: 2

Related Questions