Eduardo
Eduardo

Reputation: 397

Awk - Compare two files, match a field, merge both files

Hey Guys I need some help here, my goal is to match find or match the first part of file1 within file2

File1:

\\tempcomputer\c$\test2;test folder;c:\test2
\\tempcomputer\c$\temp;temp folder;C:\temp
\\tempcomputer\c$\unavailablefolder;c:\unavailablefolder

File2:

\\tempcomputer\c$\test2\;2.777.768 Bytes;11/09/12;11/09/12
\\tempcomputer\c$\temp\;5.400.050.974 Bytes;10/09/12;11/09/12
Error: Invalid property element: \\tempcomputer\c$\unavailablefolder

Expected output:

\\tempcomputer\c$\test2;test folder;c:\test2;2.777.768 Bytes;11/09/12;11/09/12
\\tempcomputer\c$\temp;temp folder;C:\temp;5.400.050.974 Bytes;10/09/12;11/09/12
\\tempcomputer\c$\unavailablefolder;c:\unavailablefolder;Error: Invalid property element: \\tempcomputer\c$\unavailablefolder

I would like to compare for example from the first line of file1:

\\tempcomputer\c$\test2 

search that on the second file, and concatenate both files, from file1

\\tempcomputer\c$\test2;test folder;c:\test2 

and from file2

c:\test2;2.777.768 Bytes;11/09/12;11/09/12

So the first line would be:

\\tempcomputer\c$\test2;test folder;c:\test2;2.777.768 Bytes;11/09/12;11/09/12

Expected result for the first line:

\\tempcomputer\c$\test2;test folder;c:\test2;2.777.768 Bytes;11/09/12;11/09/12 

Expected result for the second line:

\\tempcomputer\c$\temp;temp folder;C:\temp;5.400.050.974 Bytes;10/09/12;11/09/12

Expected result for the third line:

\\tempcomputer\c$\unavailablefolder;c:\unavailablefolder;Error: Invalid property element: \\tempcomputer\c$\unavailablefolder

Upvotes: 1

Views: 1100

Answers (2)

Thor
Thor

Reputation: 47099

If it is as c00kiemon5ter indicates a copy-paste error with the backslashes it's a simple matter of iterating through File2 for each line in File1, I assume you want no output when no match is found.

simple.awk

BEGIN { FS = OFS = ";" }

{ 
  l=$0
  first=$1
  while(getline < "File2") { 
    if(first == $1) {
      print l, $0
      break
    }
  }
}

Run with:

awk -f simple.awk File1

To allow an optional backslash at the end takes a bit more work, but most of the extra complexity can be moved to a function:

more-work.awk

function optional_end(s, c) {
  if(c == "")
    c = "\\"
  if(substr(s, length(s)) == c)
    s = substr(s, 1, length(s) - 1)
  return s
}

BEGIN { FS = OFS = ";" }

{ 
  l=$0
  first = optional_end($1)

  while(getline < "File2") {
    if(first == optional_end($1)) {
      print l, $0
      break
    }
  }
}

Run with:

awk -f more-work.awk File1

edit by c00kiemon5ter :3

revised simple.awk.
Works with \; first-field-line-endings and prints-joins the 3rd line too.

BEGIN { FS = OFS = ";"; if( file == "") file = "File2" }

{ 
  l=$0
  first=$1
  while(getline < file) { 
    if((idx = index($0, first))) {
      if (idx == 1)
          $1 = l
      else
          $1 = l FS $0
      print
      break
    }
  }
}

edit 2

Input file can now be given as an option -v file=SOME_FILE; if none is given "File2" is used, e.g.:

awk -f simple.awk -v file=SOME_FILE File1

Upvotes: 1

Dennis Williamson
Dennis Williamson

Reputation: 359935

Assuming that there's no terminal backslash at the end of the paths in File2, the following:

join -t ';' <(sort File1) <(sort File2)

will output:

\\tempcomputer\c$\temp;temp folder;C:\temp;5.400.050.974 Bytes;10/09/12;11/09/12
\\tempcomputer\c$\test2;test folder;c:\test2;2.777.768 Bytes;11/09/12;11/09/12

Upvotes: 2

Related Questions