Reputation: 353
On Unix, without adding anything to the OS (i.e. only using grep, awk, sed, cut, etc.), how can I split the following input into several files (e.g. "_temp1.txt", "_temp2.txt", etc.) starting with each "codeView" line? Note, it is likely that the line begins with several spaces.
What if the input is coming from an API instead of an existing file?
. . .
"events" : [ {
"id" : "123456",
"important" : true,
"codeView" : {
"lines" : [ {
"fragments" : [ {
"type" : "NORMAL_CODE",
"value" : "str = wrapper.getParameter("
}, {
"type" : "NORMAL_CODE",
"value" : ")"
} ],
"text" : "str = wrapper.getParameter("motif")"
} ],
"nested" : false
},
"probableStartLocationView" : {
"lines" : [ {
"fragments" : [ {
"type" : "STACKTRACE_LINE",
"value" : "<init>() @ JSONInputData.java:12"
} ],
"text" : "<init>() @ JSONInputData.java:92"
} ],
"nested" : false
},
"dataView" : {
"lines" : [ {
"fragments" : [ {
"type" : "TAINT_VALUE",
"value" : "CP"
} ],
"text" : "{{#taint}}CP{{/taint}}"
} ],
"nested" : false
},
"collapsedEvents" : [ ],
"dupes" : 0
}, {
"id" : "28861,28862",
"important" : false,
"type" : "P2O",
"description" : "String Operations Occurred",
"extraDetails" : null,
"codeView" : {
"lines" : [ {
"fragments" : [ {
"type" : "TEXT",
"value" : "Over the following lines of code, blah blah."
} ],
"text" : "Over the following lines of code, blah blah."
} ],
"nested" : false
},
"probableStartLocationView" : {
"lines" : [ {
"fragments" : [ {
"type" : "STACKTRACE_LINE",
"value" : "remplaceString() @ O_UtilCaractere.java:234"
} ],
"text" : "remplaceString() @ O_UtilCaractere.java:234"
}, {
"fragments" : [ {
"type" : "STACKTRACE_LINE",
"value" : "replaceString() @ O_UtilCaractere.java:333"
} ],
"text" : "replaceString() @ O_UtilCaractere.java:333"
}, {
"fragments" : [ {
"type" : "STACKTRACE_LINE",
"value" : "creerIncidentPaie() @ Incidents.java:444"
} ],
"text" : "creerIncidentPaie() @ Incidents.java:219"
}, {
"fragments" : [ {
"type" : "STACKTRACE_LINE",
"value" : "repliquerAbsenceIncident() @ Incidents.java:876"
} ],
"text" : "repliquerAbsenceIncident() @ IncidentsPaieMgr.java:882"
} ],
"nested" : false
},
"dataView" : {
"lines" : [ {
"fragments" : [ {
"type" : "TEXT",
"value" : "insert into TGE_INCIDENT...4&apos;, &apos;YYYYMMDD&apos;), &apos;A&apos;, &apos;"
}, {
"type" : "TAINT_VALUE",
"value" : "CP"
}, {
"type" : "TEXT",
"value" : "&apos;, &apos;&apos;, null, &apos;T&apos;, &apos;ADPTVT&apos;, to_date(&apos;2013012214..."
} ],
"text" : "insert into TGE_INCIDENT...4&apos;, &apos;YYYYMMDD&apos;), &apos;A&apos;, &apos;{{#taint}}CP{{/taint}}&apos;, &apos;&apos;, null, &apos;T&apos;, &apos;ADPTVT&apos;, to_date(&apos;2017062214..."
} ],
"nested" : false
}
. . .
Upvotes: 1
Views: 864
Reputation: 67567
awk
to the rescue!
$ awk '/"codeView"/{c++} {print > ("_temp" (c+0) ".txt")}' file
the header up to the first match will be in the 0th temp file. If there is a chance that the key may appear in the content perhaps change pattern match to literal match $1=="\"codeView\""
you can pipe in the data to the awk
script instead of reading from a file as well.
If there are too many files opened, you may want to close them before it errs out.
Upvotes: 2
Reputation: 204648
This will work robustly in any awk:
awk '/"codeView"/{close(out); out="_temp" ++c ".txt"} out!=""{print > out}' file
Upvotes: 4
Reputation: 113994
Try:
csplit -f _temp -b %d.tmp file '/codeView/' '{*}'
Or, if the data comes from some other program:
my_api | csplit -f _temp -b %d.tmp - '/codeView/' '{*}'
-f _temp -b %d.tmp
These two options sets the names of the split files to format that you want.
file
Replace this with the name of your input file. Use -
if input is to come from stdin.
/codeView/
This is the regex that you want to split on.
'{*}'
This tells csplit not to stop at the first match but to keep splitting.
Upvotes: 2