Shawn Tang
Shawn Tang

Reputation: 35

How to use awk or sed to combine match content from two files

I have two file, file1 and file2
I need to search file2 with each line of file1, and put them together in another form once match
my progress:

awk 'FNR==NR{ids[$0]=$0;next}{for(id in ids){if($0 ~ "\\y"id"\\y"){print "- name:"id; print "  version: " ; print}}}' file1 file2

file1:

attr-2.4.48
bzip2-1.0.8
curl-7.71.1
dnsmasq-2.86
dropbear-2022.82
elfutils-0.179
ethtool-5.4

file2:

  url: https://sourceforge.net/projects/lzmautils/files/xz-5.2.5.tar.gz/download
  url: http://download.savannah.nongnu.org/releases/attr/attr-2.4.48.tar.gz
  url: https://sourceware.org/pub/bzip2/bzip2-1.0.8.tar.gz
  url: https://curl.se/download/curl-7.71.1.tar.bz2
  url: https://sourceware.org/elfutils/ftp/0.179/elfutils-0.179.tar.bz2
  url: https://git.kernel.org/pub/scm/network/ethtool/ethtool.git/snapshot/ethtool-5.4.tar.gz

output

- name: attr
  version: 2.4.48
  url: http://download.savannah.nongnu.org/releases/attr/attr-2.4.48.tar.gz
- name: bzip2
  version: 1.0.8
  url: https://sourceware.org/pub/bzip2/bzip2-1.0.8.tar.gz
- name: curl
  version: 7.71.1
  url: https://curl.se/download/curl-7.71.1.tar.bz2
- name: dnsmasq
  version: 2.86
  url:
- name: dropbear
  version: 2022.82
  url:
- name: elfutils
  version: 0.179
  url: https://sourceware.org/elfutils/ftp/0.179/elfutils-0.179.tar.bz2
- name: ethtool
  version: 5.4
  url: https://git.kernel.org/pub/scm/network/ethtool/ethtool.git/snapshot/ethtool-5.4.tar.gz

Upvotes: 2

Views: 149

Answers (4)

anubhava
anubhava

Reputation: 784998

This awk should work for you with 2 different field separators for 2 input files:

awk '
FNR == NR {
   u = $0
   sub(/(\.[a-z][[:alnum:]]*)+(\/[^\/]+)?$/, "")
   a[$NF] = u
   next
}
{
   print "- name:", $1
   print "  version:", $2
   print "  " ($0 in a ? a[$0] : "url:")
}' FS='/' file2 FS='-' file1

Output:

- name: attr
  version: 2.4.48
  url: http://download.savannah.nongnu.org/releases/attr/attr-2.4.48.tar.gz
- name: bzip2
  version: 1.0.8
  url: https://sourceware.org/pub/bzip2/bzip2-1.0.8.tar.gz
- name: curl
  version: 7.71.1
  url: https://curl.se/download/curl-7.71.1.tar.bz2
- name: dnsmasq
  version: 2.86
  url:
- name: dropbear
  version: 2022.82
  url:
- name: elfutils
  version: 0.179
  url: https://sourceware.org/elfutils/ftp/0.179/elfutils-0.179.tar.bz2
- name: ethtool
  version: 5.4
  url: https://git.kernel.org/pub/scm/network/ethtool/ethtool.git/snapshot/ethtool-5.4.tar.gz

Upvotes: 2

stevesliva
stevesliva

Reputation: 5655

sed 'h;s/-/\n version: /;s/^/- name: /p;g;s/-.*//;s/^/grep /;s/$/ file2/e' file1 | sed 's/^$/ url:/'

  • use sed to h hold the original fine from file1
  • s/-/\n version: /;s/^/- name:/p print version/name
  • g get the original line again
  • s/^/grep /;s/$/ file2/e used s///e to create a grep command and execute it
  • | sed 's/^$/ url:/' clean up empty lines

Basically looping over the lines of file1 with sed, calling grep with s///e.

Upvotes: 1

Lars Fischer
Lars Fischer

Reputation: 10139

What you want can be done naturally in a small shell script using cut and grep.

We can use sed instead of the cut and grep combo and get a shell script like this:

cat "file1" | while read prginfo
do
    name="${prginfo%%-*}"
    version="${prginfo##*-}"
    url="$(sed -n "/${prginfo}/ {s/url: //;p}" file2)"
    printf -- "- name: %s\n  version: %s\n  url: %s\n" "$name" "$version" "$url" 
done

We loop over the lines of file1 (due to the cat, | and while read).

The name and version parts are parsed from the lines of file1 via shell parameter expansion .

The url is parsed via sed:

  • -n is used to suppress output from unmatched lines
  • /${prginfo}/ is used to match the line and apply the further sed sommands inside the braches:
    • s/url: //;p} replaces the "url: " and prints the modified line
  • we could replace this line with url=$(grep "$prginfo" "$2" | cut -d\ -f2)

This is a bit shorter than first reading and storing the file2. But it might take much longer if those files are large.

Upvotes: 2

RavinderSingh13
RavinderSingh13

Reputation: 133458

With your shown samples only, in GNU awk please try following awk code; written and tested in GNU awk should work in any version of it.

awk '
FNR==NR{
  arr1[$1]=$2
  next
}
{
  for(i in arr1){
    if(index($0,i)){
      arr2[i]
      print "- name: " i ORS "  version: " arr1[i] ORS  $0
      break;
    }
  }
}
END{
  for(i in arr1){
    if(!(i in arr2)){
      print "- name: " i ORS "  version: " arr1[i] ORS "  url:"
    }
  }
}
' FS="-" file1 file2

Upvotes: 1

Related Questions