cheapcoder
cheapcoder

Reputation: 303

How to Compare two files line by line and output the whole line if different

I have two sorted files in question

1)one is a control file(ctrl.txt) which is external process generated
2)and other is line count file(count.txt) that I generate using `wc -l`

$more ctrl.txt

Thunderbird|1000
Mustang|2000
Hurricane|3000

$more count.txt

Thunder_bird|1000
MUSTANG|2000
Hurricane|3001

I want to compare these two files ignoring wrinkles in column1(filenames) such as "_" (for Thunder_bird) or "upper case" (for MUSTANG) so that my output only shows below file as the only real different file for which counts dont match.

Hurricane|3000

I have this idea to only compare second column from both the files and output whole line if they are different

I have seen other examples in AWK but I could not get anything to work.

Upvotes: 1

Views: 60

Answers (1)

RavinderSingh13
RavinderSingh13

Reputation: 133428

Could you please try following awk and let me know if this helps you.

awk -F"|" 'FNR==NR{gsub(/_/,"");a[tolower($1)]=$2;next} {gsub(/_/,"")} ((tolower($1) in a) && $2!=a[tolower($1)])' cntrl.txt count.txt

Adding a non-one liner form of solution too now.

awk -F"|" '
FNR==NR{
  gsub(/_/,"");
  a[tolower($1)]=$2;
  next}
{ gsub(/_/,"") }
((tolower($1) in a) && $2!=a[tolower($1)])
' cntrl.txt count.txt

Explanation: Adding explanation too here for above code.

awk -F"|" '                                ##Setting field seprator as |(pipe) here for all lines in Input_file(s).
FNR==NR{                                   ##Checking condition FNR==NR which will be TRUE when first Input_file(cntrl.txt) in this case is being read. Following instructions will be executed once this condition is TRUE.
  gsub(/_/,"");                            ##Using gsub utility of awk to globally subtitute _ with NULL in current line.
  a[tolower($1)]=$2;                       ##Creating an array named a whose index is first field in LOWER CASE to avoid confusions and value is $2 of current line.
  next}                                    ##next is awk out of the box keyword which will skip all further instructions now.(to make sure they are read when 2nd Input-file named count.txt is being read).
{ gsub(/_/,"") }                           ##Statements from here will be executed when 2nd Input_file is being read, using gsub to remove _ all occurrences from line.
((tolower($1) in a) && $2!=a[tolower($1)]) ##Checking condition here if lower form of $1 is present in array a and value of current line $2 is NOT equal to array a value. If this condition is TRUE then print the current line, since I have NOT given any action so by default printing of current line will happen from count.txt file.
' cntrl.txt count.txt                      ##Mentioning the Input_file names here which we have to pass to awk.

Upvotes: 1

Related Questions