Reputation: 125
We have now some uncommon CSV data file which partly contains JSON data type as shown below:
"00001","str1","[a.b.c] str3, str4",true,false,"2022-04-18T12:00:00+00:00","[{""k1"":""v1"",""k2"":""v2""}]","str5"
We wanted to remove all characters within square brackets and braces which come together later with no other changing. But, when I use the following sed command sed -e 's/[.*]//g', it returns undesired output like:
"00001","str1","","str5"
If it were truly expected, it should be like:
"00001","str1","[a.b.c] str3, str4",true,false,"2022-04-18T12:00:00+00:00","","str5"
We do not know how to capture and replace the part containing JSON-typed data and cannot find the relative information to do so.
How can we achieve this?
Upvotes: 0
Views: 65
Reputation: 203169
You shouldn't do what you're asking for as that approach will fail given input like ...,"[{""k1"":""v1"",""foo]bar"":""v2""}]",...
where the JSON just happens to contain a ]
. For example using this modified input:
$ cat file
"00001","str1","[a.b.c] str3, str4",true,false,"2022-04-18T12:00:00+00:00","[{""k1"":""v1"",""foo]bar"":""v2""}]","str5"
and the currently accepted answer, we get incorrect output that includes a field "bar"":""v2""}]"
instead of just ""
:
$ sed 's/\[{[^]]*]//' file
"00001","str1","[a.b.c] str3, str4",true,false,"2022-04-18T12:00:00+00:00","bar"":""v2""}]","str5"
You should instead be asking how to delete the contents of a field that exactly contains a string like "[{...}]"
, e.g. using GNU awk for FPAT
:
$ awk -v FPAT='[^,]*|("([^"]|"")*")' -v OFS=',' '
{ for (i=1; i<=NF; i++) sub(/^"\[\{.*}]"$/,"",$i) }
1' file
"00001","str1","[a.b.c] str3, str4",true,false,"2022-04-18T12:00:00+00:00","","str5"
See What's the most robust way to efficiently parse CSV using awk? for more info on parsing CSVs with awk.
Upvotes: 0
Reputation: 11207
Your current code is greedy matchig from the first [
to the last ]
hence removing everything in between and also seems to have a redundant g
flag.
Try this sed
$ sed 's/\[{[^]]*]//' input_file
"00001","str1","[a.b.c] str3, str4",true,false,"2022-04-18T12:00:00+00:00","","str5"
Match from [{
an opening square bracket with curly braces beside to the next occurance of a closing sqare bracket [^]]*
Upvotes: 1