Anymaro
Anymaro

Reputation: 1

Bash - duplicate lines with different values

In Bash I have a file with following content:

05-24-2021,DIS,HostXYZ,   ,,No connection possible,
05-24-2021,DIS,HostABC,BWV ,No connection possible,
06-18-2021,SID,HostWER,SE1  ,No connection possible,
06-18-2021,SID,"Host1:32115,Host2:32115",  ,SOME TEXT,
06-18-2021,SID,HostZZZ,  ,,
06-18-2021,SID,"Host3:32115,Host4:32115",  ,ALSO SOME TEXT,

I want to search for a method to duplicate a line with 2 given hosts in it and separate them in 2 lines as follows:

05-24-2021,DIS,HostXYZ,   ,,No connection possible
05-24-2021,DIS,HostABC,BWV ,No connection possible,
06-18-2021,SID,HostWER,SE1  ,No connection possible,
06-18-2021,SID,Host1,  ,SOME TEXT,
06-18-2021,SID,Host2,  ,SOME TEXT,
06-18-2021,SID,HostZZZ,  ,,
06-18-2021,SID,Host3,  ,ALSO SOME TEXT,
06-18-2021,SID,Host4,  ,ALSO SOME TEXT,

Maybe you have any idea to transform that? I would really appreciate. I tried it with awk, sed and so on. I was able to duplicate (with "sed 's/:/:/p'") at least the affected lines but not to "filter out" the specific values.

Upvotes: 0

Views: 150

Answers (4)

Pierre François
Pierre François

Reputation: 6061

In awk, you can do it as follows:

awk 'BEGIN{FS="\""} 
  (NF == 1) {print}
  (NF > 2) {
    split($2, h, ","); 
    for (i = 1; i <= 2; i++) {
      gsub(/:[0-9]*/, ",", h[i]); 
      print $1 h[i] $3
    }
  }' input

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203413

With GNU awk for FPAT:

$ cat tst.awk
BEGIN {
    FPAT = "[^,]*|\"[^\"]*\""
    OFS = ","
}
$3 ~ /,/ {
    gsub(/"/,"",$3)
    n = split($3,hosts,/,/)
    for (i=1; i<=n; i++) {
        sub(/:.*/,"",hosts[i])
        $3 = hosts[i]
        print
    }
    next
}
{ print }

$ awk -f tst.awk file
05-24-2021,DIS,HostXYZ,   ,,No connection possible,
05-24-2021,DIS,HostABC,BWV ,No connection possible,
06-18-2021,SID,HostWER,SE1  ,No connection possible,
06-18-2021,SID,Host1,  ,SOME TEXT,
06-18-2021,SID,Host2,  ,SOME TEXT,
06-18-2021,SID,HostZZZ,  ,,
06-18-2021,SID,Host3,  ,ALSO SOME TEXT,
06-18-2021,SID,Host4,  ,ALSO SOME TEXT,

If you'll be doing any more complicated CSV parsing with awk then see What's the most robust way to efficiently parse CSV using awk?.

Upvotes: 2

Amessihel
Amessihel

Reputation: 6384

If you have at most two hosts per line, surrounded by double quotes, a simple way is to get the duplicates with capture groups:

sed -r 's/^(.*)"(Host[^":]*):[^",]*,(Host[^":]*):[^",]*"(.*)$/\1\2\4\n\1\3\4/'

Regex:

  • s/a/b/: performs substitution of text matched by a regex with b
  • ^(.*)": capture the beginning of the line until a double quote (group 1)
  • (Host[^":]*):[^",]*,(Host[^":]*):[^",]*": find after the double quote a text describing two hosts and ending with a double quote; capture each host names (groups 2 and 3)
  • (.*)$: capture the remaining text of the line

Substitution: replace matched lines by two lines (separated by an end of line \n) beginning with captured text 1, and followed by captured text 2 for the first line and 3 for the second; terminate each line with last captured text 4, ie the remaining text of the matched line.

This answer relies on the sample you provided. It won't cover untold edge cases.

Upvotes: 0

William Pursell
William Pursell

Reputation: 212248

It is a really bad idea to try this with awk. Use a proper csv parser. But, although it's fragile, you can try:

awk 'NF<2; NF>2{n=split($2,a,","); 
    for(i=1;i<=n;i++) printf("%s,%s,%s\n", $1,a[i],$3)}'  FS=\" input

Upvotes: 0

Related Questions