Reputation: 187
I am using below script to validate header of file. For which i have created one file which is having only header and comparing it with another file which is having data for column along with the header.
awk -F"|" 'FNR==NR{hn=split($0,header); next}
FNR==1 {n=split($0,fh)
for(i=0;i<=hn; i++)
if (fh[i]!=header[i]) {
printf "%s:order of %s is not correct\n",FILENAME, header[i]
next}
if (hn==n)
print FILENAME, "has expected order of fields"
else
print FILENAME, "has extra fields"
next
}' key /Scripts/gst/Kenan_Test_Scenarios1.txt
Sample file header(Key)
SourceIdentifier|SourceFileName|GLAccountCode|Division|SubDivision|ProfitCentre1|ProfitCentre2|PlantCode|ReturnPeriod|SupplierGSTIN|DocumentType|SupplyType|DocumentNumber|DocumentDate|OriginalDocumentNumber|OriginalDocumentDate|CRDRPreGST|LineNumber|CustomerGSTIN|UINorComposition|OriginalCustomerGSTIN|CustomerName|CustomerCode|BillToState|ShipToState|POS|PortCode|ShippingBillNumber|ShippingBillDate|FOB|ExportDuty|HSNorSAC|ProductCode|ProductDescription|CategoryOfProduct|UnitOfMeasurement|Quantity|TaxableValue|IntegratedTaxRate|IntegratedTaxAmount|CentralTaxRate|CentralTaxAmount|StateUTTaxRate|StateUTTaxAmount|CessRateAdvalorem|CessAmountAdvalorem|CessRateSpecific|CessAmountSpecific|InvoiceValue|ReverseChargeFlag|TCSFlag|eComGSTIN|ITCFlag|ReasonForCreditDebitNote|AccountingVoucherNumber|AccountingVoucherDate|Userdefinedfield1|Userdefinedfield2|Userdefinedfield3
File 2 header along with data(Kenan_Test_Scenarios1.txt)
SourceIdentifier|SourceFileName|GLAccountCode|Division|SubDivision|ProfitCentre1|ProfitCentre2|PlantCode|ReturnPeriod|SupplierGSTIN|DocumentType|SupplyType|DocumentNumber|DocumentDate|OriginalDocumentNumber|OriginalDocumentDate|CRDRPreGST|LineNumber|CustomerGSTIN|UINorComposition|OriginalCustomerGSTIN|CustomerName|CustomerCode|BillToState|ShipToState|POS|PortCode|ShippingBillNumber|ShippingBillDate|FOB|ExportDuty|HSNorSAC|ProductCode|ProductDescription|CategoryOfProduct|UnitOfMeasurement|Quantity|TaxableValue|IntegratedTaxRate|IntegratedTaxAmount|CentralTaxRate|CentralTaxAmount|StateUTTaxRate|StateUTTaxAmount|CessRateAdvalorem|CessAmountAdvalorem|CessRateSpecific|CessAmountSpecific|InvoiceValue|ReverseChargeFlag|TCSFlag|eComGSTIN|ITCFlag|ReasonForCreditDebitNote|AccountingVoucherNumber|AccountingVoucherDate|Userdefinedfield1|Userdefinedfield2|Userdefinedfield3
KEN|TEST1|||Tela|Outw|ANP|POST|1017|36AAA|NV|TX|4841446542|2017-12-12||2035-06-11|Y|1|36AAACB89|||||||36||||||94||Telecomm Servi||||1557.20|0.00|10.00|9.00|140.15|9.00|140.15|||||18.50||||||||B2B INV||
Getting below output and which is not correct though header in both files are same.
is not correctnan_Test_Scenarios1.txt:order of Userdefinedfield3
Could you please help me to rectify the code and also need to capture if multiple header names has msimatch
Upvotes: 1
Views: 904
Reputation: 67567
this may come in handy
$ diff -y --suppress-common-lines <(tr '|' '\n' <file1) <(tr '|' '\n' <file2)
used your first file as is for file1 and used this
$ sed 's/2/8/;s/Export/Import/' file1 > file2
to create the second file. Running the script gives
ProfitCentre2 | ProfitCentre8
ExportDuty | ImportDuty
Upvotes: 0
Reputation: 53508
OK, you've tagged this perl, so here's a perl answer. I think you're focussing on the wrong problem - why not instead read row by row, parse them into a hash, and then output your desired ordering:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
open ( my $first_file, '<', 'file_name_here' ) or die $!;
chomp ( my @header = split /\|/, <$first_file> );
close ( $first_file );
#debugging
print Dumper \@header;
open ( my $second_file, '<', 'second_file_name_here' ) or die $!;
chomp ( my @second_header = split /\|/, <$second_file> );
print join ( "|", @header ), "\n";
while ( <$second_file> ) {
my %row;
#use ordering of column headings to read into named fields;
@row{@second_header} = split /\|/;
#debugging output to show you what's going on.
print Dumper \%row;
print join ("|", @row{@header} ), "\n";
}
That way you don't care if the order is wrong, because you forward fix it.
If you really need to compare, then you can iterate each of the @header
arrays and look for differences. But that's more a question of what you're actually trying to get - I would suggest looking at Array::Utils
because that lets you trivially use array_diff
, intersect
and unique
.
Upvotes: 2