Reputation: 1335
I am looking for below input based on the sample provided below
Sample
eno~ename~address~zip
123~abc~~560000~"a~b~c"
245~"abc ~ def"~hyd~560102
333~"ghi~jkl"~pub~560103
444~ramdev "abc def"~ram~10000
Expected Output
"eno"~"ename"~"address"~"zip"
"123"~"abc"~""~"560000"~"a~b~c"
"245"~"abc ~ def"~"hyd"~"560102"
"333"~"ghi~jkl"~"pub"~"560103"
"444"~"ramdev ""abc def"""~"ram"~"10000"
Current Code :
awk 'BEGIN{s1="\"";FS=OFS="~"} {for(i=1;i<=NF;i++){if($i!~/^\"|\"$/){$i=s1 $i s1}}} 1' sample
Current code doesn't work for last line.. This is enhancement of insert quotes for each field using awk
Upvotes: 0
Views: 130
Reputation: 58578
This might work for you (GNU sed):
cat <<\! | sed -Ef - file
:a;s/^([^"~][^~]*~+("[^~"]*"~+[^"~][^~]*~+)*[^"]*"[^"~]*)~/\1\n/;ta; #1
s/.*/~&/ #2
s/~"([^"]*)"/~\1/g #3
s/"/""/g #4
s/.// #5
s/[^~]*/"&"/g #6
y/\n/~/; #7
!
This sed script works as follows:
~
within strings can be confused with field delimiters. They need to replaced by a unique character which is not present in the current line. As sed uses newlines to delimit its input, a newline cannot be presented in the pattern space and is therefore the perfect choice for such a character. Fields consist of three types of strings:
a) Strings which not start and end with double quotes and have no quoted strings.
b) Double quoted strings
c) Strings which not start and end with double quotes and have quoted strings within them.
The latter strings need any ~
's within them to be substituted for \n
's. This can be achieved by looping through the current line leaving fields of type a,b or c that do not contain ~
's and only replacing ~
's in the latter strings.
To make it easier for the next step, we introduce a field delimiter for the first string.
Remove all double quotes enclosing fields (see 1b).
All double quotes remaining are within strings of type 1c and can be quoted by prefixing a "
.
Now remove the initial field delimiter introduced in step 2.
Surround all fields by double quotes.
Replace newlines introduced in step 1 by their original value i.e. ~
.
N.B. It appears that GNU sed has a bug whereby if the translate command (y/../../) is the last command within a script or a one line command, it needs to suffixed by a ;
.
The above solution can be entered on one long line:
sed -E ':a;s/^([^"~][^~]*~+("[^~"]*"~+[^"~][^~]*~+)*[^"]*"[^"~]*)~/\1\n/;ta;s/.*/~&/;s/~"([^"]*)"/~\1/g;s/"/""/g;s/.//;s/[^~]*/"&"/g;y/\n/~/;' file
Upvotes: 2