Reputation: 673
I try to extract a whole bunch of xml using awk from a variable that includes csv.
I get the csv from a webservice that spits the following out:
2;1;"<?xml version=""1.0"" encoding=""UTF-8""?>
<project name=""ETLTasks"" version=""6.0"" modified=""1479827853273"" modifiedBy=""admin"" format=""strict"" olapId=""p0"">
<headers>
<header name=""comment"" modified=""1394702840960"" modifiedBy="""">
<comment><![CDATA[Automated tasks for OLAP Server:
- CubeCopy
- CubeRulesCalc]]></comment>
</header>
</headers>
</project>
";
I try to use awk to extract the xml. I would like that the double double quotes are replaced by only one double quotes (instead of format=""strict"" => format="strict")
for now I have the following but it does not replace the double doubel quotes as wished:
etlDefinitionClean=`echo -n "$etlDefinition" | cut -d";" -f3`
etlDefClean="${etlDefinitionClean%\"}"
etlDefClean="${etlDefClean#\"}"
awk -F "\"*;\"*" '{ gsub(/\"\"/, "\"", $2) } {print $2}' "$etlDefClean" > "$fileOut"
what I want to achieve in the end is the following:
<project name="ETLTasks" version="6.0" modified="1479827853273" modifiedBy="admin" format="strict" olapId="p0">
<headers>
<header name="comment" modified="1394702840960" modifiedBy="">
<comment><![CDATA[Automated tasks for OLAP Server:
- CubeCopy
- CubeRulesCalc]]></comment>
</header>
</headers>
</project>
and put that in a file
Upvotes: 0
Views: 66
Reputation: 10865
The command
awk -F '^(2;1;")|(";)' -v RS="" -v dq='""' -v q='"' '{gsub(dq,q,$2); print $2}' csvx.data
gives you the desired result:
<?xml version="1.0" encoding="UTF-8"?>
<project name="ETLTasks" version="6.0" modified="1479827853273"modifiedBy="admin" format="strict" olapId="p0">
<headers>
<header name="comment" modified="1394702840960" modifiedBy="">
<comment><![CDATA[Automated tasks for OLAP Server:
- CubeCopy
- CubeRulesCalc]]></comment>
</header>
</headers>
</project>
Using -v
to create the quotes is just a convenience to avoid lots of escaping. An equivalent command would be:
$ awk -F '^(2;1;")|(";)' -v RS="" '{gsub("\"\"", "\"" ,$2); print $2}'
-v RS=""
is a special value for the record separator that tells awk to consider any sequential set of non-empty lines to be a single record.
Upvotes: 2