Reputation: 1
In Bash I have a file with following content:
05-24-2021,DIS,HostXYZ, ,,No connection possible,
05-24-2021,DIS,HostABC,BWV ,No connection possible,
06-18-2021,SID,HostWER,SE1 ,No connection possible,
06-18-2021,SID,"Host1:32115,Host2:32115", ,SOME TEXT,
06-18-2021,SID,HostZZZ, ,,
06-18-2021,SID,"Host3:32115,Host4:32115", ,ALSO SOME TEXT,
I want to search for a method to duplicate a line with 2 given hosts in it and separate them in 2 lines as follows:
05-24-2021,DIS,HostXYZ, ,,No connection possible
05-24-2021,DIS,HostABC,BWV ,No connection possible,
06-18-2021,SID,HostWER,SE1 ,No connection possible,
06-18-2021,SID,Host1, ,SOME TEXT,
06-18-2021,SID,Host2, ,SOME TEXT,
06-18-2021,SID,HostZZZ, ,,
06-18-2021,SID,Host3, ,ALSO SOME TEXT,
06-18-2021,SID,Host4, ,ALSO SOME TEXT,
Maybe you have any idea to transform that? I would really appreciate. I tried it with awk, sed and so on. I was able to duplicate (with "sed 's/:/:/p'") at least the affected lines but not to "filter out" the specific values.
Upvotes: 0
Views: 150
Reputation: 6061
In awk
, you can do it as follows:
awk 'BEGIN{FS="\""}
(NF == 1) {print}
(NF > 2) {
split($2, h, ",");
for (i = 1; i <= 2; i++) {
gsub(/:[0-9]*/, ",", h[i]);
print $1 h[i] $3
}
}' input
Upvotes: 0
Reputation: 203413
With GNU awk for FPAT
:
$ cat tst.awk
BEGIN {
FPAT = "[^,]*|\"[^\"]*\""
OFS = ","
}
$3 ~ /,/ {
gsub(/"/,"",$3)
n = split($3,hosts,/,/)
for (i=1; i<=n; i++) {
sub(/:.*/,"",hosts[i])
$3 = hosts[i]
print
}
next
}
{ print }
$ awk -f tst.awk file
05-24-2021,DIS,HostXYZ, ,,No connection possible,
05-24-2021,DIS,HostABC,BWV ,No connection possible,
06-18-2021,SID,HostWER,SE1 ,No connection possible,
06-18-2021,SID,Host1, ,SOME TEXT,
06-18-2021,SID,Host2, ,SOME TEXT,
06-18-2021,SID,HostZZZ, ,,
06-18-2021,SID,Host3, ,ALSO SOME TEXT,
06-18-2021,SID,Host4, ,ALSO SOME TEXT,
If you'll be doing any more complicated CSV parsing with awk then see What's the most robust way to efficiently parse CSV using awk?.
Upvotes: 2
Reputation: 6384
If you have at most two hosts per line, surrounded by double quotes, a simple way is to get the duplicates with capture groups:
sed -r 's/^(.*)"(Host[^":]*):[^",]*,(Host[^":]*):[^",]*"(.*)$/\1\2\4\n\1\3\4/'
Regex:
s/a/b/
: performs substitution of text matched by a
regex with b
^(.*)"
: capture the beginning of the line until a double quote (group 1)(Host[^":]*):[^",]*,(Host[^":]*):[^",]*"
: find after the double quote a text describing two hosts and ending with a double quote; capture each host names (groups 2 and 3)(.*)$
: capture the remaining text of the lineSubstitution: replace matched lines by two lines (separated by an end of line \n
) beginning with captured text 1
, and followed by captured text 2
for the first line and 3
for the second; terminate each line with last captured text 4
, ie the remaining text of the matched line.
This answer relies on the sample you provided. It won't cover untold edge cases.
Upvotes: 0
Reputation: 212248
It is a really bad idea to try this with awk
. Use a proper csv parser. But, although it's fragile, you can try:
awk 'NF<2; NF>2{n=split($2,a,",");
for(i=1;i<=n;i++) printf("%s,%s,%s\n", $1,a[i],$3)}' FS=\" input
Upvotes: 0