novice
novice

Reputation: 165

Compare semicolon separated data in 2 files using shell script

I have some data (separated by semicolon) with close to 240 rows in a text file temp1. temp2.txt stores 204 rows of data (separated by semicolon).

I want to:

  1. Sort the data in both files by field1, i.e. the first data field in every row.
  2. Compare the data in both files and redirect the rows that are not equal in separate files.

Sample data:

temp1.txt
1000xyz400100xyzA00680xyz0;19722.83;19565.7;157.13;11;2.74;11.00
1000xyz400100xyzA00682xyz0;7210.68;4111.53;3099.15;216.95;1.21;216.94
1000xyz430200xyzA00651xyz0;146.70;0.00;0.00;0.00;0.00;0.00

temp2.txt
1000xyz400100xyzA00680xyz0;19722.83;19565.7;157.13;11;2.74;11.00
1000xyz400100xyzA00682xyz0;7210.68;4111.53;3099.15;216.95;1.21;216.94

The sort command I'm using:

sort -k1,1 temp1 -o temp1.tmp
sort -k1,1 temp2 -o temp2.tmp

I'd appreciate if someone could show me how to redirect only the missing/mis-matching rows into two separate files for analysis.

Upvotes: 1

Views: 598

Answers (4)

pixelbeat
pixelbeat

Reputation: 31708

You want the difference as described at http://www.pixelbeat.org/cmdline.html#sets

sort -t';' -k1,1 temp1 temp1 temp2 | uniq -u > only_in_temp2
sort -t';' -k1,1 temp1 temp2 temp2 | uniq -u > only_in_temp1

Notes:

  • Use join rather than uniq, as shown at the link above if you want to compare only particular fields
  • If the first field is fixed width then you don't need the -t';' -k1,1 params above

Upvotes: 1

rjp
rjp

Reputation: 1958

Look at the comm command.

Upvotes: 1

ghostdog74
ghostdog74

Reputation: 342303

using gawk, and outputting lines in file1 that is not in file2

awk -F";" 'FNR==NR{  a[$1]=$0;next }
( ! ( $1 in a)  ) {  print $0 > "afile.txt" }' file2 file1

interchange the order of file2 and file to output line in file2 that is not in file1

Upvotes: 0

abbot
abbot

Reputation: 27850

Try

cat temp1 temp2 | sort -k1,1 -o tmp
# mis-matching/missing rows:
uniq -u tmp
# matching rows:
uniq -d tmp

Upvotes: 3

Related Questions