Reputation: 187

Validate header of file in linux

I am using below script to validate header of file. For which i have created one file which is having only header and comparing it with another file which is having data for column along with the header.

awk -F"|" 'FNR==NR{hn=split($0,header); next}
     FNR==1 {n=split($0,fh)
            for(i=0;i<=hn; i++)
                if (fh[i]!=header[i]) {
                   printf "%s:order of %s is not correct\n",FILENAME, header[i]
                 next}
            if (hn==n)
                print FILENAME, "has expected order of fields"
        else
                print FILENAME, "has extra fields"
next
                }' key /Scripts/gst/Kenan_Test_Scenarios1.txt

File 2 header along with data(Kenan_Test_Scenarios1.txt)

SourceIdentifier|SourceFileName|GLAccountCode|Division|SubDivision|ProfitCentre1|ProfitCentre2|PlantCode|ReturnPeriod|SupplierGSTIN|DocumentType|SupplyType|DocumentNumber|DocumentDate|OriginalDocumentNumber|OriginalDocumentDate|CRDRPreGST|LineNumber|CustomerGSTIN|UINorComposition|OriginalCustomerGSTIN|CustomerName|CustomerCode|BillToState|ShipToState|POS|PortCode|ShippingBillNumber|ShippingBillDate|FOB|ExportDuty|HSNorSAC|ProductCode|ProductDescription|CategoryOfProduct|UnitOfMeasurement|Quantity|TaxableValue|IntegratedTaxRate|IntegratedTaxAmount|CentralTaxRate|CentralTaxAmount|StateUTTaxRate|StateUTTaxAmount|CessRateAdvalorem|CessAmountAdvalorem|CessRateSpecific|CessAmountSpecific|InvoiceValue|ReverseChargeFlag|TCSFlag|eComGSTIN|ITCFlag|ReasonForCreditDebitNote|AccountingVoucherNumber|AccountingVoucherDate|Userdefinedfield1|Userdefinedfield2|Userdefinedfield3
KEN|TEST1|||Tela|Outw|ANP|POST|1017|36AAA|NV|TX|4841446542|2017-12-12||2035-06-11|Y|1|36AAACB89|||||||36||||||94||Telecomm Servi||||1557.20|0.00|10.00|9.00|140.15|9.00|140.15|||||18.50||||||||B2B INV||

Getting below output and which is not correct though header in both files are same.

 is not correctnan_Test_Scenarios1.txt:order of Userdefinedfield3

Could you please help me to rectify the code and also need to capture if multiple header names has msimatch

Upvotes: 1

Answers (2)

karakfa

Reputation: 67567

this may come in handy

$ diff -y --suppress-common-lines  <(tr '|' '\n' <file1) <(tr '|' '\n' <file2)

used your first file as is for file1 and used this

$ sed 's/2/8/;s/Export/Import/' file1 > file2

to create the second file. Running the script gives

ProfitCentre2                                                 | ProfitCentre8
ExportDuty                                                    | ImportDuty

Upvotes: 0

Sobrique

Reputation: 53508

OK, you've tagged this perl, so here's a perl answer. I think you're focussing on the wrong problem - why not instead read row by row, parse them into a hash, and then output your desired ordering:

#!/usr/bin/env perl

use strict;
use warnings;

use Data::Dumper;

open ( my $first_file, '<', 'file_name_here' ) or die $!; 
chomp ( my @header = split /\|/, <$first_file> ); 
close ( $first_file ); 
#debugging
print Dumper \@header; 

open  ( my $second_file, '<', 'second_file_name_here' ) or die $!; 
chomp ( my @second_header = split /\|/, <$second_file> );

print join ( "|", @header ), "\n";
while ( <$second_file> ) {
    my %row;
    #use ordering of column headings to read into named fields; 
    @row{@second_header} = split /\|/;
    #debugging output to show you what's going on. 
    print Dumper \%row; 

    print join ("|", @row{@header} ), "\n";
}

That way you don't care if the order is wrong, because you forward fix it.

If you really need to compare, then you can iterate each of the @header arrays and look for differences. But that's more a question of what you're actually trying to get - I would suggest looking at Array::Utils because that lets you trivially use array_diff, intersect and unique.

Upvotes: 2

Validate header of file in linux

Answers (2)

Related Questions