Muhammad Abdullah
Muhammad Abdullah

Reputation: 19

Comparing two files using Awk in linux

I have two Files, File A and File B. The structure of the File A is mentioned shown below:

3314530275|76|1|20240422045006|
3335984469|64|2|20150804235959|
3367892381|203|3|20141025235959|
3369039388|203|4|20131219235959|

The contents of the second File B are given below:

3314530275|2000|999000000073101614|0|20370101000000|76|
3314530275|2000|999000000073101614|0|20370101000000|76|
3369039388|2000|812000002628721|-112|20360101235959|203|
3335984469|5037|5210367877660|180|20150213000000|64|
3335984469|5048|5210367877661|6|20150213000000|64|
3335984469|2000|812000002629182|1913|20360101235959|64|
3367892381|5014|5210365185964|419430400|20150308000000|203|
3367892381|5044|5210365185965|226020|20150308000000|203|
3367892381|2000|817000102009605|0|20360101235959|203|

The script should first check File A, if the third field ($3) is equal to 2, it should store the value of first ($1) and fourth column ($4).

Afterwards it will check if the $1 values (of the second file) are present in the values that we stored in the first step.

  1. If the value is present and the second field is equal to 2000 it should print $1,$2,$4,(Value of the fourth column that we got from the first file and stored it)

  2. If the value is present and the second field is not equal to 2000, it should print $1,$2,$4,$5

Sample Output in the above mentioned case:

3335984469|5037|180|20150213000000|
3335984469|5048|6|20150213000000|
3335984469|2000|1913|20150804235959|

This is what I have so far:

awk -F \| 'FNR==NR {if($3 == 2) a[$1] = $4; next} ($1 in a) {if($2==2000) print$1"|"$2"|"$4"|"a[$1]"|"} ($1 in a) {if($2!=2000) print$1"|"$2"|"$4"|"$5"|"} ' FileA FileB > Output_File

Any help will be greatly appreciated.

Upvotes: 2

Views: 167

Answers (2)

repzero
repzero

Reputation: 8412

awk  'BEGIN{FS=OFS="|"};FNR==NR{if($3==2){a[$1]=$4;next}};{if( $1 in a && $2==2000 ){print $1,$2,$4,a[$1]}else if ($1 in a && $2!=2000){print $1,$2,$4,$5}}' 'fileA'  'fileB'

adjustments that I have made to your command line to get the command line above

if( $1 in a && $2==2000 ){print $1,$2,$4,a[$1]}

else if ($1 in a && $2!=2000){print $1,$2,$4,$5}}

results

3335984469|5037|180|20150213000000
3335984469|5048|6|20150213000000
3335984469|2000|1913|20150804235959

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203209

Your script will work as-is given correct contents of fileA (335984469 in FileA should be 3335984469, i.e. one more leading 3.) but it can be simplified to:

$ cat tst.awk
BEGIN{ FS=OFS="|" }
FNR==NR { if ($3==2) a[$1] = $4; next }
$1 in a { print $1, $2, $4, ($2==200 ? a[$1] : $5), "" }

$ awk -f tst.awk fileA fileB
3335984469|5037|180|20150213000000|
3335984469|5048|6|20150213000000|
3335984469|2000|1913|20360101235959|

Feel free to cram it all back onto one line if you find that useful.

If the above doesn't work, check for the presence of control characters in both of your input files, the most likely being control_Ms as generously donated by Microsoft whenever their tools create files. You can check for them using cat -v and remove them with dos2unix or similar.

Upvotes: 1

Related Questions